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AN IMPROVED ALGORITHM FOR LINEAR 
INEQUALITIES IN PATTERN RECOGNITION 
AND SWITCHING THEORY 



Leo C. Geary, Ph.D. 

University of Pittsburgli, 1968 

A new iterative algorithm is presented to solve for an n by 1 

solution vector w, if one exists, to a set of linear inequalities, 

A w > 0 which arises in pattern recognition and switching theory. The 

algorithm is an extension of the Ho-Kashyap algorithm, utilizing the 

gradient descent procedure to minimize a criterion function for a 

solution of the linear inequalities. The criterion function to be 

N .2 

minimized is J(y) "A 7 (cosh -y y.) where Z " A H ” .k b is a 

i-1 ^ 

vector with all positive elements. This criterion function has a 
larger gradient than previously used criterion functions. The algorithm 
is expressed below: 

w(0) - /b(0) , MO) > 0 

;^(k) - A w(k) - Mk) 
b(kfl) - b(k) + p(k) h(k) 

h(k) - [h^(k)l - [sinh y^^Ck) + |sinh y^(k) |] 

w^k+1) - w(k) + p(k) A^h(k) 
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where k Is the Iteration step and Is the generalized Inverse of the 

N by n pattern matrix A. p(k) can be expressed as p(k) - 1/cosh 

with y (k) - Max |y. (k)| or as p(k) - Num./Den. where 
max 1 

Num. - [^(k) + |ir(k)|] R(k) [^(k) + 1 1 and 

tl Jl 

Den. - 2[i(k) + li(k)|] R(k)(l - A A"]R(k) (xCk) + li(k)|] 

where R Is a diagonal matrix [r^^ ] with r^^^^ - slnh The algorithm 

also simultaneously tests for the nonexistence of a solution of the linear 
Inequalities whenever all y^ are nonpositive with at least one y^ negative. 
This algorithm applies to two-category classification problems. 

The algorithm has a faster rate of convergence than Ho-Kashyap 
algorithm for a certain range of the Initial value of b, MO) . A comparison 
has been made between the Improved algorithm with p(k) - Num. /Den. given 
above and the Ho-Kashyap algorithm with p-1. The convergence rate Is 
greatly Increased for 0.001 < b^(0) < 0.5 (1*1,2, . . . »N) as verified by 
computer results of sample problems In switching theory and pattern 
recognition. For problems where a large number of Iterations, for example, 
greater than twenty, were required for the Ho-Kashyap algorithm, the 
proposed algorithm reduced the number of Iterations by a factor of 20 to 
450. The total computing time was approximately reduced by a factor of 
three and In one case by 380 with the proposed algorithm. For problems where 
a small number of Iterations were required by the Ho-Kashyap algorithm. 
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for example, less than twenty, the proposed algorithm reduced the number 
of Iterations by as much as 30 percent. 

The generalization of the proposed algorithm applicable to multi- 
class pattern classification problems has been presented and a convergence 
proof has been given. The algorithm solves for an n by R-1 solution matrix 
U of a set of linear Inequalities A U (e (for all l?<j and 

j"l,2, . . . ,R) , where the and the R vertex vectors of a (R-1) dimen- 

sional equilateral simplex. This generalized algorithm Is given In the 
following equations: 

U(0) - A(0) 

Y(k) - A U(k) - B(k), Z,(k) - Y(k) E, 

- - -j T “j 

B(k+1) - B(k) + p(k) H[Y(k)J 
H [Y(k)] - [S,(Z(k) + A,(k)jE/^ 

U(kfl) - U(k) + p(k) A^^H[Y(k)J 

where again k Is the Iteration step, 

ij(Z(k) - (jSjq(Z(k))J - [slnh jZ. (WJ. (Z-1 R-1) 

^(k) - [/jq(k)]. 

and 




Iv 



p(k) Is expressed as 






P(k) - I I iSjWW) 

j*l £■! 



R-i t » 

2 I h (I - A K)\i 

oil -q - q 



where 



,Cj(k) ft [j^ij(k)R(j^^(k)) + jA (k)](E^*^)'V^(j^^(k)R(j^(k)) - ^A^Wl* 



» 



R(„Z (k)) ■ a diagonal matrix [r . . (1^)) 1 
— 1—j 11 l~~2 



and 



A Slnh 

Y,,(o^(k)) = , (1-1,2,...,R-1). 

A^jq 



The proof of convergence of this multiclass algorithm utilizes the concept 
of mapping the pattern classes Into vertices of the equilateral simplex. 
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I . INTRODUCTION 



A, General Background 

A great amount of research for the solution of linear 
inequalities has been undertaken in the past ten years. One of the 
reasons for this research is the development of linear separation 
approaches to pattern recognition'" * and threshold logic 

problems, Both of these problems require the determination of 

a decision function or decision functions which, in the case of 
linear separation, involve a system of linear inequalities, 

1, Pattern Recognition 

The problem of pattern recognition requires the consideration 

of three fundamental aspects: namely, characterization, abstraction, 

f 13> 

and generalization. The characterization aspect is concerned 

with the measurement selection and feature extraction. From the 
measured patterns or raw data, a set of independent variables are 
selected to describe the patterns under consideration. These 
Independent variables are known as primary attributes or measurements, 
and are denoted by u^,U2f..u^, These attributes can be further 
processed to give a set of independent variables where 

Xi*(|li(ui, , , ,Ud> , i=l,2,,,,r, 



*Parenthetical references placed superior to the line of text ref’eV 
to the bibliography. 
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I . 

which adequately characterize the original patterns for the puirpose 
of classification. The vector x formed by the components 
is called the pattern vector. The abstraction aspect is the 
determination of the decision functions or discriminant functions 
so as to separate the given sample patterns according to their 
respective classes. This aspect is also called the training aspect. 

For a R-category pattern classification problem, a set of R discriminant 
functions, g (x) , j «1,2,...R, are to be determined from N sample 

j “■ 

patterns of known classification such that 

if the pattern x is of class C^. For a two-category classification 
or dichotomieation problem, a single discriminant function, 

g(x) - g^(x) - 

f 

may be used so that it separates all the sample pattern vectors 
into two classes. Thus the function, g(x) , must satisfy the following 

two inequalities: 

g(x)>0 for all sample patterns belonging to the class C^, 

g(x)<C0 for all sample patterns belonging to the class C2. 

The ability of the determined discriminant functions to recognize 
correctly the class of new sample patterns is considered the 
generalization aspect which assesses the error rate after training. 

U 



o 
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2. Switching Problems 

The switching problems referred to here are a special class 
of pattern classification problems in which the primary attributes 
are the r independent switching variables of the problem, 

X ,x ,...x . Each of these variables can assume only one of 
two values which can be represented by either ® and 
or -^1 and 1. The pattern vectors, x*s, are the vertices of a 

r-cube, which are 2 ^ in number. Every vertex of the hyper cube may 
belong to either one of the two classes or remain unspecified with 
regard to its class. A Boolean function g(x), known as a switching 
function is associated with every switching problem. It is a 
decision function to separate the vertices into two classes. Such 
a decision function can be realized by a threshold logic circuit. 
Thus the switching problem is essentially an abstraction problem 
and the techniques of linear inequalities have been applied to such 

. ... VI (6,10,11) 

switching problems. 

B. Ho-Kashyap Algorithm 

The deterministic abstraction problem for two-category 
classification, as mentioned above is to determine a decision 
function, g(x), of the pattern vector x such that 



4 



g(x)> 0 if X belongs to class 

g(x)<0 if X belongs to class C 2 

for all of the N sample pattern vectors. For the linear separation 
of the pattern vectors x*s, g(x) is a linear decision function 
represented by 

g(x) - + ... + 

where the weight components w^^ ,W 2 » are to be determined. For 
notational simplicity, let x be now redefined as an n by 1 augmented 
pattern vector whose first component is unity and the remaining 
(n-1) components are the pattern components x^,X 2 ».».x^ mentioned 
previously, where n-i+1. The transpose of x is 

x*^ ■ (l,x^,X2,...xp. (1»1) 

Let the transpose of the n by 1 weight vector be 

w*" ■ (Wj^,W2,» • »w^) . (1»2) 

The discriminant function for the dichotomization problem is 

g(x) ■ 2 L^w. (1»3) 

Among the N sample or training patterns, let n^ of them belong to 
class and n 2 of them to class C 2 , where nj ^2 * N. They are 
designated respectively by ^x^, (i ■ l,2,...n^), and ^X2* 



5 



(1 - 1,2, where the subscript on the right denotes the 
pettern class and the subscript on the left denotes the ith pattern 
in that class. Then the problem is to determine a weight vector 
w such that 






for i - 1,. . .n^. 



and 



(l.A) 



£*2 — 



for i - l,...n^. 



Ho and Kashyap have developed an iterative algorithm to solve 

( 12 ) 

for w, which is considered one of the best available algorithms* 

Let A be the N by n matrix of sample patterns defined below: 



A - 



t 

1-1 



t 

2-1 



*1 

nj-l 



" 1-2 



( 1 . 5 ) 



t 
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Inequality (1.4) then becomes inequality (1.6): 

Aw>0. (1.6) 

Let b be am by 1 vector with all positive components and ^ be the 
N by 1 vector defined by 



z ■ 



Aw - 



b. 



(1.7) 



The Ho-Kashyap iterative algorithm for a solution of w is given by 



w(0) - a\(0) , 



^(0)>£ but otherwise arbitary 



2,(k) ■ ^(k) - b(k) 
w(k+l) ■ w^k) + pA^[^(k) + 
b(k+l) - Mk) + p[y:(k) + |z^*t)|l 



( 1 . 8 ) 



where k denotes the Iteration number and A is the generalized inverse 
oi: The algorithm is exponentially convergent and a solution 

of w can be obtained in a finite number of iterations when all of 
the components of ^(k) become positive or zero» provided that the 
given sample patterns are linearly separable. 

The Ho-Kashyap algorithm was developed from the view point 
of minimizing a criterion function J ■ ||^ " “ ||zll^* 

derivation consists of the following two steps: (1) for a fixed 
b.^fi» determine a w to be a least square fit to ^ " k " £• 

(2) for a fixed w» allow ^ to change in the direction of steepest 
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descent of J, subject to the constraint b>£. This algorithm has 
a high convergence rate for a number of pattern recognition problems 
It also provides a test for nonlinear separability of the sample 
patterns. If the given sample patterns are not linearly separable, 
that is, the system Aw> 0 is inconsistent, this is indicated at 
a certain step in the iteration by ^(k )^0 which is defined as 
all components of ^(k ) are negative or zero but with at least one 
non-zero component. 

The generalization of the Ho-Kashyap algorithm to multi- 
class pattern classification has been attempted by Blaydon^^^^ and 
Fu and Wee^^^^ and Li, et el^^^^ Experimental results have also 
been reported. 



C. Objectives of the Dissertation 



(18) 



As ascertained by Devyaterlkov, Propol, and Tsypkln , 
a general recursive formula can be obtained for the system of 
inequalities (1.6) by minimizing a suitably chosen convex criterion 
function J(^). In addition to the original Ho-Kashyap algorithm 
which uses J(x) " S y^, other well known non-parametrlc learning 
algorithms may also l)e Interpreted as obtained from minimizing 
different criterion functions J(y,)» for example, J ■ (jr| - y, for 
perceptron’s training algorithm, and J *fjy| ” ^ relaxation 

type training algorithm. Thus the solution of a system of linear 
inequalities can be made equivalent to a minimization problem. 
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With this concept as the motivation, it has been attempted 
to choose another criterion function J having steeper gradient than 
Ho-Kashyap's with a hope to further accelerate the convergence of 
the algorithm. Thus, the main objectives of this dissertation are: 
(1) to develop an improved iterative algorithm for the two-category 

classification problem with the choice of 

N 

J(Z) <Cosh 1/2 

and (2) to generalize this algorithm for multiclass pattern classi- 
fication. The convergence proofs are given in Chapter II and 
Chapter V respectively. The improvement on the convergent rate has 
been demonstrated by a number of computer experiments on switching 
problems and pattern recognition problems. These experimental 
results are presented in Chapters III and IV. 
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II. AN ACCELERATED ALGORITHM OF LINEAR 
INEQUALITIES FOR DICHOTOMIZATION 

A. Development of the Algorithm 

In this chapter, an accelerated Iterative algorithm will be 
developed for the solution of the set of linear Inequalities (1.6) 
which Is rewritten In the following equation: 

A w > ^ . (2.1) 

This algorithm Is an Improvement of the Ho-Kashyap algorithm by choosing 
i a criterion function 

N 2 

J(2.) - 4 ^ (cosh^y^) (2.2) 

1«1 

to be minimized where y^ Is the 1th component of the N by 1 vector ^ 
defined in equation (1.7), that Is ^ 

z ** Ah. “ a » A ^ .2. • (2.3) 

The Improvement lies In an acceleration of the Ho-Kashyap algorithm caused 
by a steeper gradient of J(y;) as can be seen when a comparison Is made 
between the two criterion functions. Let designate the criterion 

function used In the Ho-Kashyap algorithm, 

I 
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j..(20 - Mzll^ - Z y. 

hk j.i 1 



(2.4) 



2 

Since J(Z) and J. ^(z) reach their respective minimal when each (cosh y^) 

ritv 

and each are respectively minimized, one can simply compare J(yj^) and 
Jhjj(y^) , the convex functions of one variable only , where 

J(yi) * 4(cosh ~ y^)^ (2.5) 



and 






(2.6) 



These two functions are Illustrated in Figure 1. Taking the gradients of 
J(y^) and respect to y^, one obtains 

- 4 (cosh i y^) (slnh " 2 slnh y^ 



2^25 

■ yi -^5! ■" - 



(2.7) 



and 



— - - 2y 

3^1 ^ 



( 2 . 8 ) 



It Is clear that the absolute value of 



3J(y^) 



3y^ 



is greater than the absolute 



3Jui,(y ) " ^ 

^ everywhere except at y^^ ■ 0 where they are equal. In 



3yi 



value of 
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Figure 1. Comparison of Criterion Functions J(y.) and J (y ) 

1 hk 1 
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aj(x) 

general, the gradient — is greater than the gradient — ' everywhere 
except at the origin X * Since the gradient descent procedure is used 
in both algorithms, and since and b, or ^ and w» a^e linearly related, 
it is conceivable that* the proposed algorithm may have a higher convergence 

rate for a solution w. 

As mentioned before, J(x) reaches a minimum when each term 

1 2 1 2 
(cosh 4 y ) , (1-1,. ..,N), is minimized. For each (cosh j y.) to be a 

minimum, each y . , (1-1,..., N), must equal zero and y_ * ^ gives a desired 

^ t 

solution. Thus one is attempting to cluster the values [^Xil 'i 

-[jXal^w, (l-l, 2 ,...,n^;j-l, 2 ,...,n 2 ) about the positive scalers b^’s, 

(1-1,2, ... ,N) . Since the b^’s are only constrained to be positive, J(y) 

can be minimized with respect to both w and b subject to the condition 

that b > £. Note that it is not necessary to attain the minimum value 

A 

of J(^); in fact, a solution w is obtained whenever y. 1 0 with b > 0 



from which 


* 

follows A w 


> b > 0. 








Let 


the matrix 


A defined in (1.5) 


be also represented as 








®12 • • • 










A - 


®21 ®22 


®2n 


• 


(2.9) 






®N1 ®N2 • • • 


®Nn 


















From (2.3), 














®12 ”2 • • • 


+ a. w ~ b. 

in n 1 


(2.10) 






(1-1,2,. ..,N) 
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i 



and 
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9y. 



= a 



3w. ij 
3 



( 2 . 11 ) 



'"i A 



( 2 . 12 ) 



where is th'i kronecker delta. Let 



3^(X> = 4(cosh i y^)^, . . . ,N) 



(2.13) 



then 



N 



J(z) = I J±iy) 

i=l 



(2.14) 



The gradients of J^(^) with respect to w^ and are respectively 



3J^(Z) 

3w. 



4 (cosh I y^Xslnh | y^) ^ 



= 2 (sinh y ) 






.. ^ — = 2 a sinh y. 
i 3w ij 



(2.15) 



and 



3J.(£) 3y 

“^bT “ ^ ^i^ abj “ ^i 



j 



j 



(2.16) 



Now, 



o 



14 



and 



3Ji(Z) _ 



3w 



3J^(X) 

3^ 



3J^(^) 










aw^ 




^il 






• 

• 




2 sinh y. 

•' 1 


• 

• 

• 








a. 

in 






3w 








- 


n 




















3Ji(z) 




"^il 




~wr 






• 




1 






• 


s 


• 

• 




= -2 sinh y^ 


• 








iN 














N 



















(2.17) 



(2.18) 



where the derivative of a scalar with respect to a column vector is a 
column vector. Hence, the gradient of J(^) with respect to w is given by 



dUz) N 3J^(^:) 



3w 



i=l 



3w 



= 2 sinh 



ll 



12 



In 



+ 2 sinh y. 



= 2 



2 A' 



sinh y^^ 
sinh y2 



sinh y 



N 



^11 


to 


• • • 


a , 
Nl 


^In 


^2n 


• • • 


^Nn 



^^21 
^22 

"2n 

sinh y^^ 

sinh 



+ ... +2 sinh y 



N 



(2.19) 



er|c 



s 
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and the gradient of J(^) with respect to b is given by 



ab 



N aj . (x) 

J — — 

i«l 9^ 



- 21 



sinh 
sinh y 2 



sinh y 



N 



- 2 



sinh y^ 
sinh y 2 



sinh y 



N 



aj(z) 



Since w is not constrained in any way, 



aw 



-= 0 implies 



sinh y^ 




0 


sinh y« 




0 




m 


• 


# 




• 


# 




• 


sinh Yjj 




0 






tm 



(2.20) 



which, in turn, implies y^ * 0 for all i*l,2,...,N. Therefore, for a 
fixed b > 0, minimizing J(x> with respect to w gives 



Solving the above equation for w, one obtains 



Ju 

w » A b 



# ^ *(20) 
where A is the generalized inverse of A 



( 2 . 21 ) 



9J(Z) 

On the other hand, for a fixed w, ® 0 with b > 0 dictates 

a descent procedure of the following form, with k denoting the iteration 



number: 
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b(k + 1) » b(k) + AMk) 



( 2 . 22 ) 



where the components of Ab^(k), lal,2,...,N, of A^(k) are governed by 

3J(y(k)) 

2 sinh y^ if y^ > 0, 



Ab^(k)<< 



-( 



(2.23) 



0 



if y^ 0 . 



Introduce a positive scalar p(k) as the proportionality constant and 
rewrite equation (2.23) in the vector form, 

sinh y^(k) + |sinh y^(k) 
sinh y 2 (k) + |sinh y 2 (k) 



Ab(k) - p(k) 



sinh + \u±nh Yjj(k) 



- p(k) h(k) , 



(2.24) 



where 



h(k) 



h^(k) 




sinh y^^(k) + | 


sinh y^(k) | 


h2<k) 




sinh y 2 (k) + 

• 


sinh y 2 (k) | 


1 

-t 




• 

• 

sinh y (k) + 
N 


|sinh Yjj(k) 1 



(2.25) 



As can be shown later, p(k) may be chosen as equal to 

1 



p(k) ■ 



cosh y (k) 
max 



(2.26) 
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where 

y (k) -Max |y. (h)|, (2.27) 

max jL 1 

Substituting (2.24) Into (2.22) and, from (2.21), writing 



+ 1) - A^b(k + 1) - a" [ b(k) + Ab(k) ] 



w(k) + p(h) A h(k) , 



(2.28) 



one obtains the following algorithm: 



(2.29) 



w(0) - JhiO ) , b(0) > 0 but otherwise arbitrary 

y^(k) — A w(k) “ ^(h) 

MW-1) ■ bOt) + P(^) 

w(k+l) - w(k) + ^(k)A^h(k) 

where h(k) and p(k) are given by equations (2.25) and (2.26) respectively. 
Note that In this algorithm p(k) varies at each step and Is a nonlinear 
function of ^(b) . A recursive relation In y(k) can also be obtained from 

(2.29). 

y^(k+l) - A w(kfl) - b(kfp - A A^(k+1) - b(k+l) 
m A /[b(k) + p(k)h(k)J - b(k) - p(k)h(k) 



. A w(k) - b(k) + (A A^^ - pp(k)h(k) 
y(k+l) - ^(k) + p(k) (A A ” ^) h(k) . 



(2.30) 
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Just like the Ho-Kashyap algorithm, it can be shown that the 

* 

above algorithm (2.29) converges to a solution w of Che system of 
linear Inequalities in a flikite number of steps provided that a solution 
exists, and simultaneously acts as a test for the Inconsistency of the 
linear inequalities. These properties are formally stated In a theorem 
as given in the next section. 

B . Theorem 1 

Before discussing the main theorem, a lemma to be used in the 
proof of the theorem will be given first. 

Lemma l:^et one consider the set of linear inequalities (2.1) and the 
algorithm (2.29) to solve this set. Then 
1) y^(k) £ 0 for any k; 

and 

2) if the set of linear inequalities is consistent, then 

^(k) £ ^ for any k. 

( 12 ) 

This lemma is the same as the one given by Ho and Kashyap except that 

the iterative algorithm is different. The proof of the lemma is not jlven 

here since it is identical to the proof of Ho-Kashyap lemma. Recall again 

the notation used in the lemma : y^(k) ^ 0 means that y^^(k) ^ 0 for all 1 

but 2 . possesses at least one negative component. This lemma is a rigorous 
statement that with a consistent set of linear inequalities A w > 0, the 
elements of the vector cannot be all non-positive. 
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Theocem 



1: Consider the set of linear Inequalities (2.1) and the 

algorithm (2.29) to solve these Inequalities, and let 

V[2.(k)] » I \y,(X)\ 1^- 

1) If the set of linear Inequalities Is consistent then 

a) AV[x(k)] - V[x(k+1)]- V[i(k)] < 0 and 11m V[x(k)] - 0 

k-+« 

Implying convergence to a solution In an Infinite number 
of steps; and 

b) actually, a solution Is obtained In a finite number of 
steps. 

2) If the set of linear Inequalities Is Inconsistent, then 

•k . . 

0 xlst a positive Integer k such that 

k 

AV[x(k)] < 0 for k < k 
AV[y^(k)] ■ 0 for k ^ k , and 

1 * 

2 ^(k) ji ^ for k < k 

X(k) - x^k*) 1 0 for k ^ k* 

and 

w(k) - w(k ) for k ^ k 
Mk) * Mk*) for k k*. 

In other words, the occurrence of a nonpositive vector y^(k) 
at any step terminates the algorithm and Indicates the Incon 
slstency of the given set of linear Inequalities. 
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Proof t 

Part 1: Since the algorithm (2.29) can be rewritten as a recursive relation 

in y^(k) given by (2.30), and 

V[y^(k)] - || 2 .(k)||^ > 0 for all ^ £• (2.31) 

V[y^(k)] can be considered as a Liapunov function for the nonlinear differ- 
ence equation (2.30). Thus 

A V[^(k)] ■ V[;^(k+1)] - V[^(k)] 

- Ili(k+l)||^ - llzcwll^ - Z(k+1) - zW 

. (jr(k) + p(k)(A a'^ - ph(k))^[i(k) + p(k)(A A^ - I)li(k)) 

- x(^) 

- p(k)h'^(k) (A / - i)* i(k) + p(k) y*^(k) (A A* - I) h(k) 

+ p^(k) h‘(k) (A A*^ - (A - I) h (W . 

(20) 

idempotent , 

I)‘ - (A / - I), 

(* L ~ i) (i^ ^ ■ i) “ A A^A ^ ■ A A^ ■ A A^ 

A A*’* - A A* - ^iL +1-1- Lt- 

h‘(k) (A / - I)x(k) + p^(k)h*(k)(I - A /)h(k). 



Since (A - I) is hermltian 



(AA*- 



(A / - I ) ‘ (A ii - I) 






then, 



V[i(k)l - 2 p(k) 



o 



A /i(Ic) - A /[A w(k) - b(k) J - A /[A A(k) - b(k) ] 

- (A A*A / - A A'''jb(k) - [A pf - A /]b(k) 

* £» 

hence 

AV[;^(k)] reduces to 

AV[2(k)J - -2 p(k)h‘(k) i(k) + p^(k)h*(k) (I - A /)h(k). 



Let 



slnh 

slnh 



a.(l) 






slnh y 



N 



slnh 
slnh y, 

T 



y2 

V2 



slnh y 



N 



N 



'N 



slnh y^ 
0 



0 



slnh y2 
”^2 



0 



slnh y 



N 






N 



(2.32) 



R(Z) Z 



(2.33) 
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where 



R<yi 



slnh 



0 

slnh Y2 



0 

0 



slnh y, 



N 



'N 



dlag (r, .r 



11 ’ 22 



> • • 



NN 



(2.34) 



slnh y 

Note that > 0 for all y^, R(^) > 0 and - R(^) . 

Then s^Cy^) has the following properties 



1 ^( 2 ) " 

I s(z) I - |r(z)zI * R(z) Izl . 

Il(z)l*^“ |s*^(z)| ■ lzl*^B.(z) • (2.35) 



From (2.25) and (2.35), the properties of Ji(y^) are: 

h(z) -8(2) + |s(iH- R(Z)Z + R(i) lil - R(i)tz+ |i|], 

t t ^ 

h (2) - (8(2) + 1 8(2) I ] “ 8 (x) + IsCx)]*^ - 2 1<2) + l2l*R<2) 

“ [2 + Izl I* R<2)- 




(2.36) 
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Reducing the first term of equation (2.32) by the relation In (2.36), one 
obtains 



-2 p(k)h^k)x(k) - -2 p(k)[x(k) + |y(k)|]*^ R(k) ^(k) 

- -p(k)[^(k) + |x(k)|]^ R(k) ^(k) 

-p(k)[y_(k) + ly(k) I R(k)^(k). 

Adding lx(k) I to the first term and subtracting I from the second 

term on the right hand side of the above equation gives 

-2 p(k)h^(k) ■ “P(h)[x(h) + 

-p(k)[x(k) + l2(k) |]‘ R(k)[jr(k) - Ii(k)|]. (2.37) 

It will be shown that the second term on the right hand side of equation 
(2.37) Is zero. Since 



[^;(k) + |2(k)|]^ R(k) [y;(k) - ■ 

- [yj^(k) + |yj^(k)|, ..., y^(k) + |yjj(k)|]* 



"r (k) 0 ... 0 

11 




yi(k) - 


1 


0 r22<k) ... 0 

• • • 

• • • 


• 


yjCk) - 


|y2<^> 1 


• * * 








* o 

o 

• 

• 

• 

7 ^ 

1 




y (k) - 
N 


|yj,(k)l 



N 

- + |yi(k)|][y^(k) - |y^(k)|] 



(2.38) 



! 



I 
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(yj^(k) - ly (k)l) - if 2- ° 

(y^(k) + ly^(k)l) - 0. l£ y^(k) i 0, 



therefore, 

(Z(k) + |jr{k)ll‘R(k)[z(k) -i(k)l - 0 . (2-39) 

Substitute (2.39) Into (2.37). The first term of equation (2.32) Is then 
reduced to 

-2 p(k)h'=(k) i(k) - -p(k)(i(k) + li(k)|l'=R(k)(i(k) + lz(k)J.(2.40) 
Substituting (2.36) and (2.40) Into (2.32). one obtains 

6V(i(k)l - -p(k)(2.(k) + |i(k)|l'^R(W(i(k) + lx(k)|l 

V(k)(z(W + lz<Wll‘S<W (I - A /)R(k)(i(k) + ly(k)|l 

- -(l(k) + li(k)H'^[p(k)R(k) + p^(k)R(k)(A A* - I)R(k)](i(k) +| l(k) 1 1 

- -llx(k) + |i(k)||l^ I, 2 . 

(p^(k)R(k)A A R(k)4p(k)R(k)-p (k)R‘(k) ). 

(2.41) 

For Av(i(k)J to be negative semldeflnlte. In particular, AV(z(k)l - 0 only 
If x(k) - 0 or x(k) iO. the matrix (p^k)R(k) A /r(R) + p(k)R(k) - P (k)R^(k)l 



o 
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I 
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Must be positive definite, k is positive semidefinite since is 

hermitian idempotent, A x ^0 for any x; it follows that 

A 2L “ 2lVA A^^A X =* (A 2L)^A A^^(A x) = ^^A 1 O a; ^^nce 

R A A^'^R is also positive semidefinite. Now one can choose a p(k) such 
that [p(k)R(k) - p^(k)R^(k)] is positive definite. From (2.34), 

p(k)R(k)-p^(_k)R^(k) = 



p(k)r (k) - p^(k)r ^(k) 
11 



0 



0 



p(k)r22(k)-p^‘(k)r22(k) 



0 



0 



0 



p(k)r^(k)V(k)r^ (k) 
NN nn 



[p(k)R(k) - p^(k)R^(k)] is positive definite if 



[p(k)r (k) - p^(k)r^ ^(k)] > 0 for all i=l,2,..., 
ii li 



N. 



(2.42) 



sinh y . , . j 

Since r^^C^) = > 0 for all i and p(k) is restricted to be positive, 

the above condition reduces to the condition 



1 — p(k) ^ ^ for all i“l,2,...,N, 



(2.43) 



For p(k) chosen in equation (2.26), 



" cosh 



where 



^ |y/^> I » 

nidx i 



^ sinh y^(k) sinh y^(^) 

p(k)rii(k) ^ y (k) y (k)cosh y _(k) 



max 



max 



y^^Ck) y.^(k) 

(y^(k) + Jj + 51 + ...) 

y^ (k) y^ (k) 



y/(k) y/(k) 

(1 + + 



3! 



5! 



+ . . . ) 



(k) 



N ri . max . max + \ n + j. \ 

y^(k)(l + ■+ . . .; U + - 2 T- + -^1 + • • •) 



" y^”(k) 

(2n+l)l 

n=o 



00 



I 



-2n 



max 



(k) 



n=o (2n)I 



Note that 



y^^(k) 



y a 

max 



< 1 for all i»l, . . .N 



2n 

(k) 

(2n+D! 

2a r « V 

y (k) 
max 

(2n) ! 



< 1 for all 



i»l,2 

n=l,2 



,N 

00 

» 



o 

I ERIC 
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it follows that 



I 



2n ... 
Vj (k) 



p(k)rii(k) = 



FO 


(2n+l)! 


00 

1 


2n .. . 

y (k) 

•^max' 


1=0 


(2n)! 


satisfied and 




1 



(2.A4) 



positive definite for p(k) ® 
property of negative semidef inite for P(k) = 



cosh y (k) 
max 



Thus AV[y_(k)] has the desired 
1 






and for any 



finite ;^(k) . 

From equation (2.41) one notes that AV[x(k)] equals zero if and 



only if y.(k) * 0 or x(k) ^ 0- Since it is assumed that the set of linear 
inequalities (2.1) is consistent, and from the lemma ^(k) i. therefore 



AV[y_(k)] < 0 for all y_(k) ^ 0 



a 0 if y;(k) = £ . 



(2.45) 



By Liapunov’s stability criterion, the equilibrium state 
discrete system (2.30) can be reached asymptotically, i.e.. 



=s £ of the 

lim| |x(k) I I 

k-x» 




which corresponds to a solution w with Aw » b > £. This completes 



the proof of Part 1(a). 

To prove the convergence of the algorithm (2.29) in a finite number 
of steps, one notes that b(k) is a nondecreasing vector. Let 



b^(0) - [1,1, ...1] 
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then 

^^(k) ^b^(O) ^ [1,1,..., 1] for any k > 0. 

Since A w(k) * b(k) + ^ [1,1,...,!] Implies A w (k) >0 

when a solution w is reached. But V[y^(k)] ^ 1 implies (k) [ < [1, . . . ,1] . 
Since V[x(k)l converges to zero in infinite time, it must converge to 
the region V[y^(k)] = 1 in finite time, hence |x^(k) | < [1,1,...,1], 

A w(k) > 0, and a solution w* * w(k) is obtained in a finite number of 
steps. This completes the proof of Part 1(b). 

Part 2 ; It has been proved in Part 1 that V[x(k)]is negative semidefinite 
independent of the consistency of the linear inequalities. Now, If the 
set of linear inequalities (2.1) is inconsistent, one notes that y^(k) 
cannot be 0 and hence V[y^(k)] cannot become zero for any k > 0. There 

ic 

must exist a value of k, called k , such that 

* 

AV[y^(k)] < 0 for 0 k < k , 

= 0 f or k = k* , 

•k 

y^(k) ^ ^ for 0 ^ k < k . 

But V[^(k )] = 0 if either ) = £ or y^(k ) £ 0 . Since ) ^ £» 
this implies y^(k ) _^ (^ ^nd hence, from (2.25), li(k ) = _0. Equation (2.30) 



j(k) = y^(k ) £ for all k ^ k 



indicates that 
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As a consequence, one obtains 



AV[^(k)] = 0 
h(k) = 0 
w(k) « w(k ) 
b(k) = b(k*) 



for all k ^ k 

i 

for all k > k 



for all k > k 



for all k > k 



This completes the proof of the theorem. 



C. An Optimum Choice of p(k) 

The choice of p(k) = OO Previous section is 

max 

only one of many possible choices of p(k) for the convergence of the 
algorithm (2.29). The convergence rate may be further improved by 

choosing a p(k) such that the decrease in the Liapunov function V[x(k)] ; 

is maximized at every step, that is, -dV[ 2 .(k)] is maximized with 

respect to p(k). 

Take the partial derivative of AV[x(k)] in equation (2.41) 
with respect to p(k), 

3{-AV[x(k)]} 3 (Ix(k) +1 y:(k)|l*^[p(k)R(k) + p?(k)R(k)(M^'-I)R(k)][;r(k)+|x(k)|l 

3{p(k) } ” 9P(k) 

- [y(k)+lz(k) I ]'^[R(k)-2p(k)R(k) (I-M^SR(k)Hi(k)+li(k) 1 1 . (2.46) 

For -SVti(k)) to be a maximum, must equal zero as a necessary 




condition. Hence, 
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2p(k)[j-(k) + |i(k)|]'^[R(k)(I - A Af)R(k)](i(k) + li(k) | ] 
= [i(k) + lz(k) I ] *=R(k) [i(k) + |2.(k)|] 



p(k) 



[X(k) + |x(k) |]*^R(k)[x(k) + 

2[x(l^) + - A A^'']R(k) [x(k) + lx(^) 1 1 



(2. 47) 



provided that 

(i(k) + |£(k)|l*^R(k)[I - A A**lR(k)[x(k) + |2,(k) | ] ^ 0. 

(2.48) 

During the iteration process, ;j^(k) ^ ^ and ^(k) 0. Since R(k) > 0 

and 1 - A ^ 0, the condition (2.48) is satisfied unless 1 - A ^ 0; 
therefore, for I - A A^^ > 0, both numerator and denominator in (2.47) 
are positive definite, hence p(k) given by (2.47) is positive. At this 
value of p(k), AV[y(k)] is negative definite in [y;(k) + 1 1 which 

is required in the convergence proof of the algorithm (2.29). This can 
be shown by substituting (2.47) into (2.41) which, upon simplification, 
gives 

AV[y(k)J = - ip(k)[jr(k) +|i(k) | jVk) [i(k) +|i(k)|] < 0 
2 

, (2.49) 

3 (-AV[x(k)]) 

For this value of -AV[y^(k)] to be a maximum, ^ 

8{p(k)} 

less than zero for p(k) given by (2.47). Since, in general. 




I 
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3^(-AV[x(k)] ) ^ ^ |i(k)|l'=[2R(k)(A a" - I)R(k) [i(k) + lz(k) | J 



3{p(k)} 



(2.50) 



which is negative definite in [^(10 + |x(k) | 1 . Thus p(k) of equation 
(2.47) does maximize -AV(i(k)J at each iteration and is the optimum 

choice if jL " A A ^ 

If I - A a" = 0, equation (2.48) is not satisfied and -AV[x(k)J 
becomes a linear function of p(k), 

-AV[x(k)] * 1 I ^ 

which has no finite maxima at finite p(k). Equation (2.47) cannot be 

used but any other positive p(k) greater than ^ "HI 

mflx 

i m prove the convergence rate. 



D. Summary of the Procedure 

The following ten steps summarize the procedure developed to 

solve for a solution w of a set of linear inequalities A w > 0. 

1. Select a b(0) > 0. Calculate the initial weight vector, 

w(0) , where w(0) = 

— A 

2. Determine the z vector, where z - A w. 

3. Check if the z vector is greater than 0, that is all z^ > 0, 
for i**l , ... 



o 
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4. If z is greater than 0, a solution w has just been obtained 
and the problem is linear separable; otherwise 

5. Calculate the x vector by ^ - b. 

6. Check the x vector if y. 1 1 

1=1,..., N, but with at least one negative component. 

7. If ^ < 0, then the set of linear Inequalities is incon- 
sistent or the problem is not linear separable; dtherwise 

8. Modify b such that b = b + p h, where h is calculated 
from equation (2.25) and p from either equation (2.26) 
or equation (2.47). 

JL 

9. Modify w such that w » w + p A h. 

10. Return to step 2. 

The above steps are shown in the flow chart Figure 2. Notice 
that, just like the lio-Kashyap algorithm, the process continues until 
the consistency or separability of the problem is determined. 
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Figure 2. Flow Chart of the Proposed Algorithm. 
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III. APPLICATION OF THE ACCELERATED ALGORITHM 
TO SWITCHING FUNCTIONS 



A. A Special Algorithm for Switching Functions 



For a switching function of r binary variables, x^,X 2 , . . . ,x^ , 
one is concerned with the vertices of a r cube, each vertex being assigned 
to only one of the two classes C^ or C 2 . It is required to find a 
separating hyperplane, if one exists, between the two classes. 

If an n by 1 vector x> (n=r+l) , as defined in (1.1), is associated 
with each vertex of the hypercube, that is, 

where Xq Is the threshold attribute which will always equal +1 and 
the components x^^,...,x^ are the coordinates of a vertex of the r- 
dlmenslonal hypercube. Assume that each x^, (i=l,2 , . . . ,r) , may take 
on values +1 and ~1 instead of +1 and 0. Let 



^ 1 ?-!’ 22 - 1 ’ 






} c Class C. 



^(n^+l)-2» (n^+2)-2* '"’m-2^ ^ ^2 

where the intersection of class 1 and class 2 is the empty set. Each one 
of the 2^ vertices of the r cube is allotted to one class or the other. 
Then the total number of pattern vectors of the two classes is m='2 . 




r 
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Finding a separating hyperplane g(x) = 0 between the two classes is 
equivalent to finding a weight vector w as defined in (1.2) such that 

k(.x) = ^x^ w > 0 for j=l,2,...,n 

^ J- J- - ^ (3.1) 

< 0 for j=n^+l,...,m 

which is the same as 

for j=l, . . . ,n^ 
for j»n^+l , . . . ,m 



n 



i=2 J - ^ ^ 



< - w. 



where -w is called the threshold value and w^'s, (i=2 ,3 , . . . ,n) , arc 

1 t 

called weights for the switching function g(x) “ x 51* Write all ^x in 
a compact matrix form as defined in (1.5) 



t 

1 -- 1 

t 

2 - 1 

"r 1 



(3.2) 
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The set of equations (3.1) can be rewritten as 

A w > lO . 

Then a weight vector w of the Boolean function g(x) can be obtained by 
solving for the above Inequalities. If a separating hyperplnne or a 
switching function does not exist, tlien the above inequalities will be 
Inconsistent , 



The accelerated algorithm developed in the previous chapter v'lll 



be used to obtain a suitable weight vector for eacli of the switching, 
functions considered In the next section to show Its high convergence 



algorithm can be significantly simplified, however, owing to tlie special 
nature of switching functions. An essential property of the binary 



rate and effectiveness. Following the Ho-Kashyap dlscusslon^^^^ the 





(3.3) 



and 



A « (A A) A ■ (.i I) A - (2 1) A “2 A 



r.s-l.t 



(3. A) 
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functions. The accelerated algorithm in equation (2.20) becomes 
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w(0) 



= 2 ^”“^^A^b(0), b(0) >0 but 



otherwise arbitrary 



j;(k) 

b(k+l) 

w(k+l) 



= A w(k) “ ^(k) 

= b(k) + p(k)h(k) 

= w(k) + 2”^""^)p(k)A'^h(k) 



(3.5) 



where h(k) is g.iven by equation (2.25),, and p(k) can be given by either 
equation (2.26) or equation (2.47). A digital computer program for the 
special algorithm (3.5) has been written in MAD language and is listed 
in Appendix A. 



B. Example Problems 

Seven switching function problems are presented to demonstrate 
the effectiveness of the accelerated algorithm. Comparisons are made 
between the results obtained by this algorithm and those ob.ained by 
Ho-Kashyap algorithm to illustrate the improved convergence rate. The 
first two examples are explained in detail while the results of the 
other five examples are given and discussed. Example 3 is a Boolean 
switching function defined by Winder as a testing function for newly 
created procedures for switching problems. 

1. Example 1 : A switching function of three binary variables. 

Consider that in a Boolean function of three binary variables 



A, B , and C , 
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T = A B’ + A C + B'C' 

F = B C + A'C + A'B . 

Designate the true 2S.'® class and the false x's as of class 

C 2 » Then 



Class 


= {0,4, 5, 6 } = 


h-1’ 


2 - 1 ’ 


3^1* 


4311 } 


Class 


= { 1 , 2 , 3, 7} = 


{ 5 K 2 . 


6 " 2 ’ 


7^2* 


8 =^ 2 ^ 



Using (1,-1), instead of (1,0), for the binary representation of x^, 
(i=l,...,r; r=3) , one obtains 

^x\ = (1,-1, -1,-1) 

- ( 1 , 1 . - 1 ,- 1 ) 

- ( 1 , 1 , - 1 , 1 ) 

x\ - (1,1, 1,-1) 

4— 1 

X^^ = (1,-1, -1,1) 

x" = (1,-1, 1,-1) 

6 - 2 

= ( 1 ,- 1 , 1 , 1 ) 

7“ 2 

gX^2 ^ (1,1, 1,1) 






Note that x always equal to +1 and, in this case, n=4, m=8, 
o 

A is 



r * -1 












1 % 




1 


-1 


-1 


-1 


2^1 




1 


1 


-1 


-1 


3^1 




1 


1 


-1 


1 


4 % 




1 


1 


1 


-1 


" 5-2 


- 


-1 


1 


1 


-1 


" 6^2 




-1 


1 


-1 


1 


- 7^2 




-1 


1 


-1 


-1 


" 8-2 




-1 


-1 


-1 


-1 















Choose 



then 



b^-CO) = [1,1, 1,1, 1,1,1] 



w(0) 



^ A^b(O) 



*n-l - 



1 

8 



0 




0 


4 




1/2 


-4 


s 


- 1/2 


-4 




- 1/2 
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The matrix 



% • 
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and 



A w(0) 




1 

3 

1 

1 

1 

1 

3 

1 



> 0 . 



Since A w(0) > 0, the procedure terminates at the zeroth iteration. This 
result is the same as obtained from the Ho-Kashyap algorithm. A switching 
function g(x) = x^w is obtained by taking the threshold element -w^ - 0 
and the three weight components = -l* w^ = - Note that 

this w(0) should also satisfy the relationships fof the Boolean function 

T and F and It does as shown below; 



A B' -> 1/2 - (-1/2) - 1 >0 
A C’ -> 1/2 - (-1/2) = 1 > 0 
B’C’ -> -(~l/2)-(-l/2) = 1 > 0 



B C -> (-1/2) + (-1/2) = -1 < 0 

A'C -> -d/2) + (-1/2) = -1 < 0 

A'B -> -(1/2) + (-1/2) = -1 < 0 

2. Example 2 ; A switching function of four binary variables. 

In a Boolean function of four variables A, B, C, and D, consider 

T«BCD + AC + AD + AB' 

F = B C’D' + A’C + A'D' + A'B' . 



41 



This corresponds 



to Class C and Class C_ of x, 
1 2 - 



Class C^ = {7,9 to 15} = • *8^1^ 

Class C^ = {0 to 6,8) = * 

Using (1,-1) for the binary representation of (i**l , . - . ,r ;r-4) , one 

obtains 

” (1,-1, 1,1.1) 

- ( 1 , 1 ,- 1 , - 1 , 1 ) 

= ( 1 , 1 ,- 1 , 1 ,- 1 ) 

x", = (1,1, -1,1,1) 

4~ 1 

x.^ “ (1,1,1, -1,-1) 

X*', - (1,1,1, -1,1) 

6 ^* 1 

^x^^ = (1,1,1, 1,-1) 

gx'=^ - (1,1, 1,1,1) 

x*'^ ’ (l.-l.-l.-l.-l) 

102i"2 “ 

ll--^ “ 

” (1,-1, -1,1.1) 
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j^gX^2 ” (l.-l.l.l.-l) 

= { 1 , 1 ,- 1 , - 1 ,- 1 ) 

16~ 2 



The m by n matrix A, where n*5, m»16, is represented by 





Choose 



b (0) * [ 1 , 1 , 1 » 1 » 1 » 1 » 1 » 1 » 1 1 1 1 ^ ^ ^ ^ 1 



then 



0 

.75 

.25 

.25 

.25 



Since A w(0) ^ on6 has to determine y_(0) t ^(0), h(0) , and p(0) 
i 0» proceed with the algorithm, to calculate w(l) . 

p(k) in equation (2.26) be used, 



w(0) « A*^B(0) 



2n-l 



16 



0 




12 




4 


ss 


4 




4 









where 



p(0) 



1 1 _1 

cosh y (0) cosh 1 1.54305 

•^max 



y (0) “ Max |y (0) | * 1 . 
max i 1 



h*^(0) - s*^(0) + |s*^(0) I 



Since 

Let 



[0,0,0,0,0,0,0,1.04218,1.04218,0,0,0,0,0,0,0] 



AA 



The algorithm terminates after the first iteration where A w(l) >0 with 



- l6 


0 

12 

A 

A 

A 


1 

16 


0 

.675A6 

.675A6 

.675A6 

.675A6 


1 

* 16 


0 

12.675A6 

A.675A6 

A.675A6 

A.675A6 












- 



This is a desired weight vector w to be used in the switching function 
g(x) * x^w. 

( 6 ) 

3. Example 3 ; Winder’s problem of eight binary variables 

Consider a Boolean function of eight binary variables which correspond 

to the separation of two classes: 



Class 1 - {27 to 31, 39,Al,to A7, A9 to 63, 71,73 to 79 
81 to 127, 131, 133 to 255} 

■ {jXi> » (j-l,2,...,207) 

Class 2 - (0 to 26, 32 to 38, A0,A8,6A to 70, 72, 80, 
128 to 130, 133} 

- (j-208,...,256) 

Here n*9 and m*256. For 



b^(0) " 1 1 1 1 » • • • ^ ^ » ^1 • 
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and p(k) given in equation (2.26), the algorithm terminates after the 4th 
iteration and gives a solution weight vector for the switching function 
g(x) - x^w, 




1.0077 

0.6136 

0.4694 

0.4694 

0.3508 

0.3508 

0.1704 

0.1405 

0.1405 



4. Example 4: A switching function of six binary variables 

Consider a Boolean function of six binary variables which 
corcaipond to the separation of two classes: 

Class C^ - {30,31,41 to 63} - ^ (j»l,...,25) 

Class« C^ - (0 to 29, 32 to 40} - { x«} (j-26, . . . ,64) 

2 j ^ 

Here n*7 and m*64 . For 




b*^(0) - [ll,.l,.l, l,.l,.l] 
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and p(k) given in equation (2.47), the algorithm terminates after 
the 1st iteration and gives a solution weight vector for the 
switching function “ 21 

-0.8287 
1.9149 
1.2763 
w, - 0.9954 

—H 

0.3425 

0.3425 

0.1246 

5. Example 5 : Another switching problem of six binary variables 

Consider a Boolean function of six binary variables which 
corresponds to the separation of two classes: 

Class C “ {46,47,53 to 63) ■ ^j21i^» (j“l,2, . . . ,15) 

1 J 

Class C » (0 to 45, 48 to 52} - { x )» ,64) 

2 J 2 

Here n»7 and m-64. For 

b*(0) - [1,1.1. ••.1.1.11 




mm 



and p(k) given in equation (2.26), the algorithm terminates after 
the Ipt iteration and gives a solution weight vector w^ for the 
switching function g(x) ■ x*^w, 

-0.7598 
0.5723 
0.4219 
0.3281 
0.2344 
0.1406 
0.0469 

6. Example 6 ; Another switching problem of eight binary variables 
Consider a Boolean function of eight binary variables which 
corresponds to the separation of two classes: 

Class C - {127,191,215,217 to 255} - { x) (j-l,...,42) 
1 ^ 

Class C •< (0 to 126, 128 to 190, 192 to 214, 216} 

2 

- {jx_2} (j-43,...,256) 

Here n»9 and m»256. For 

b^(0) “ [.1,.1,*1,***»*1»*^,*^1 
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and p(k) given Ir equation (2.47), the algorithm terminates after 
the 10th iteration and gives a solution weight vector for the 
switching function g(x) * 




0.3732 

0.2278 

0.2278 

0.1654 

0.0769 

0.0569 

0.0247 

0.0247 

0.0247 



7. Example 7 : A nonlinear ly separable problem of eight binary variables 

Consider the following two classes of vertices of an eight- 



dimensional hypercube: 



Class C^ ■ {5 to 11,20,21,27,28,35,35,44,51,60,76,91,92,106, 
107,121,122,136,137,151,152,167,182,183,197,198, 
212,123,227,228,243 to 252} - { x^} (J-l,...,46) 

Class C^ - {0 to 4, 12 to 19, 22 to 26, 29 to 34, 37 to 43, 

45 to 50, 52 to 59, 61 to 75, 77 to 90, 93 to 105, 
108 to 120, 123 to 135, 138 to 150, 153 to l66, 

168 to 181, 184 to 196, 199 to 211, 214 to 226, 

229 to 242, 253 to 255} - ( j-47 , . . . ,256) 



o 
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1 / 



ERIC 



Here n*9 and m=256. For 



b^(0) 



add p(k) given in equation (2.47), after the zeroth iteration, the 
aXgorithm gives f. £ which indicates that the given sets of vertices 

are not XinearXy separabXe, 

8, Discussion 

The Xast five exampXe probXems have been soXved by the use 

of the proposed aXgorithm with various vaXues of Jb(0) and with either 
X 



p(k) 



as given by (2.26) or p(k) given by (2.47). In aXI 



cases, b(0) has equaX components, i.e, b^(0) = b2 (0) = ... = b^(0). The 

numbers of iterations required to soXve the exampXe probXems in aXX 

experiments are shown in TabXe X and TabXe 2. These exampXe probXems 

have aXso been soXved using the Ho-Kashyap aXgorithm, and the resuXts 

are shown in TabXe 3 and TabXe 4. Note that in each of these exampXes 

with the Ho-Kashyap aXgorithm the number of iterations required does not 

change for different initiaX vaXues of the ^(0) vector. But the number 

of iterations required does change for different initiaX vaXues of the 

b(0) vector with the proposed aXgorithm, as shown in TabXe X. This is 

so because b(0) infXuences p(k). AXso note that the number of iterations 

required for the proposed aXgorithm with p(k) « r and 

cosh 
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Table 1 . Number of iterations required to solve the example 
problems using the proposed algorithm with 

“ cosh ' 



Example No . 
b ^( 0 ) 


3 


4 


5 


6 


* 

7 


2.0 


9 


112 


2 




0 


1.0 


4 


42 


1 






0.5 


3 


29 


1 




0 


0.2 


3 


26 


1 


231 


0 


0.1 


3 


25 


1 


229 




0.05 


3 


25 


1 


229 




0.01 


3 


25 


1 


229 




0.001 


3 


25 


1 


229 




10 “^ 


3 


25 


1 


230 




10-5 


3 


25 


1 


245 




10“6 


3 


29 


1 


340 




10-7 


5 


32 


1 







ic 

Not linearly separable. 



Table 2 . 



Number of iterations required to solve the example 
problems using the proposed algorithm with p(k) given by 



Example No. 
b^(0) 


3 


4 5 


5 J 

6 7 


2.0 








1.0 








0.5 








0.2 

0.1 


2 


1 1 


10 0 


0.05 

0.01 


2 


1 1 


10 


0.001 

10 “^ 

10 "^ 

10 "^ 









*Not linearly separable. 



Table 3. 



Number of iterations required to solve the example 
problems using the Ho-Kashyap algorithm with p=0.5. 



Example 

b^(0) 


No. 3 


4 


5 


6 


* 

7 


2.0 


5 


52 


1 




0 


1.0 


5 


52 


1 






0.5 


5 


52 


1 




0 


0.2 


5 


52 


1 




0 


0.1 


5 


52 


1 


462 




0.05 


5 


52 


1 


462 




0.01 


5 


52 


1 


462 




0.001 


5 


52 


1 


462 




10-^ 


5 


52 


1 






10“5 


5 


52 


1 






10-6 


5 


52 








10“7 


5 


52 


1 







’^^Not linearly separable. 



Table 4. Number of iterations required to solve the example 
problems using the Ho-Kashyap algorithm with p=1.0. 



Example No 
b^(0) 


.3 


4 


5 


6 


* 

7 


2.0 


3 


25 


1 






1.0 


3 


25 


1 






0.5 


3 


25 


1 




0 


0.2 


3 


25 


1 




0 


0.1 


3 


25 


1 


229 




0.05 


3 


25 


1 


229 




( .01 


3 


25 


1 


229 




0.001 


3 


25 


1 


229 




10"^ 


3 


25 


1 


229 




10"^ 


3 


25 


1 


229 




10"^ 


3 


25 


1 


229 




10'^ 


3 


25 


1 







Not linearly separable. 
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0.5 > b (0) ^ 0.001 is less than that for the Ho-Kashyap algorithm 
“ i “ 

with p = 0.5, and is equal to that for the Ho-Kashyap algorithm with 

p « 1.0. The value of p = 1.0 for the Ho-Kashyap algorithm minimizes 

(13) 

the number of iterations required for switching functions . For 
extremely small b^(0), b^(0) ^ 10 ^ , as well as larger b^(0), 

b (0) > 0.1, the proposed algorithm with p(k) “ (1^ take 

more iterations, for the proposed algorithm with the optimum p(k) 
given by (2.47), the number of iterations required is less than or 
equal to that of the Ho-Kashyap algorithm with p » 1.0. In the 
problems where the Ho-Kashyap algorithm required a very large number 
of iterations, the proposed algorithm reduced this number by a fairly 
large factor. 

It has been observed in these experiments that the proposed 
algorithm reduced the computing time also. For example, for problems 
requiring a few iterations for the Ho-Kashyap algorithm the total 
computing time was reduced from 90 seconds to 19 seconds and execution 
time reduced from 30 seconds to 10 seconds with a dollar saving of $4.00, 
from $5.00 to $1.00. For problems requiring a large number of iterations 
for the Ho-Kashyap algorithm the proposed algorithm reduced the total 
computing time from 80 minutes to 50 seconds and execution time from 
30 minutes to 5 seconds with a cost reduction of $22.00, from $23.50 
to $1.50. 

For a given problem, different initial values of the _b(0) 
vector lead to different solution weight vectors, w. 
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It has also been observed that If two b(0) vectors differed by 
a constant factor, the solution weight vectors thus obtained also 
differed by the same factor as long as the number of Iterations required 
remained the same. 

Let the number of elements of A w(k) that are less than zero 

be designated as an error index for a set of linear inequalities at 
ttl 

the k iteration step. This error index Is represented by the number 

of ^ A w that are less than zero, where ^A is the i^^ row of the matrix 

A. Table 5 shows the number of ^A w >0 observed in the experiments 

for examples 3, 4, 5, and 6 using both the Ho-Kashyap algorithm and 

the proposed algorithm with b^(0) = [0.1,0. 1, ... ,0.1] . The sum of 

m 

A w > 0 and A w < 0 equals 2 which for examples 3, 4, 5, and 6 
1 ^ 



equal 256, 64, 64, and 256 respectively. Note that, after the zeroth 
iteration, this error index for the proposed algorithm with p(k) •= 



cosh 

■'max 



is less than or equal to the error Index for the Ho-Kashyap algorithm 
with p*0.5 and is equal to that for the Ho-Kashyap X'/ith p«1.0. The error 
Index for the proposed algorithm with p(k) given by (2.47) is always 
less than or equal to that for the Ho-Kashyap algorithm with p*1.0. This 
error information assures the effectiveness of the proposed algorithm. 

For the algorithm developed there is no guarantee that all 
w^ > 0, (1*=1 , . . . ,n) , which is necessary for a threshold logic circuit 
realizable by transistors. Since there is no prior knowledge about a Boolean 
function, one does not know if it is linearly separable by a weight 
vector with all positive elements. 




f 
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Table 5. Comparison of the Error Indices for the Proposed Algorithm 
and the Ho-Kashyap Algorithm with b^(0)=0.1 for all 1. 





Iteration 


Ho-Kashyap 


Ho-Kashyap 


Proposed p(k) 


Proposed p(k) 


Example 


No. 


p-0.5 


P“1 .0 


given by Eq.(2.26) 


given by Eq.(2.47) 






No. of 


No . of 


No . of 


No. of 








( j^Aw>0) 


( .Aw>0) 


(^Aw>0) 




0 


241 


241 


241 


241 




1 


250 


254 


254 


242 




2 


250 


254 


254 


256 


3 


3 


254 


256 


256 






4 


254 










5 


256 










0 


60 


60 


60 


60 




1 


62 


63 


63 


64 




2 


62 


63 


63 






3 


63 


63 


63 




4 


• • » 
24 


• • • 

63 


• • • 

63 


• • • 

63 






23 


63 


64 


64 






51 


• • • 

63 










52 


64 










0 


62 


62 


62 


62 


5 


1 


64 


64 


64 


64 




0 


242 


242 


242 


242 1 




1 


246 


250 


250 


250 




2 


250 


250 


250 


250 




3 


250 


250 


250 


250 




4 


250 


250 


250 


250 




5 


250 


250 


250 


254 




6 


250 


250 


250 


254 




7 


250 


250 


250 


254 




8 


250 


250 


250 


254 


6 


9 


250 


250 


250 


254 




10 


250 


250 


250 


256 




11 


250 


250 


250 






• • • 

16 


• • • 

250 


• • • 

250 


• • • 

250 






37 


250 


25S 


251 






• t • 

43 


• • • 

250 


• • • 

251 


• • • 

251 






44 


250 


252 


252 








• • • 


• • • 


• • • 
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TAble 5. (Continued) 





Iteration 


Ho-Kashyap 


Ho-Kashyap 


Proposed p(k) 


Proposed p(k) 


Example 


No. 


p»0. 5 


p=1.0 


given by Eq.(2.26) 


given by Eq.(2.47) 




No. of 


No. of 


No . of 


No. of 






( j^Aw>0) 


( ,Aw>0) 
i — 


(j^Aw>0) 


(^Aw>0) 




74 


250 


252 


252 






75 


251 


252 


252 






• • • 

90 


• • • 

251 


• • • 

252 


• • • 

252 




6 


91 


252 


252 


252 






• * * 
228 


• • • 

252 


• • • 

252 


• • • 

252 






229 


252 


256 


256 






• • • 
461 


• • • 

252 


• • • 


• • • 






462 


256 









IV. APPLICATION OF THE ACCELERATED ALGORITHM 



TO PATTERN RECOGNITION 

For dichotomization of patterns other than switching’; function 
problems, no simplification of the algorithm can be made, and the pro- 
posed algorithm in equation (2.29) togctlior with p(k) given In 
equation (2.26) or equation (2.47) will be used. The generalized 
inverse of the matrix A must be calculated once per problem for the 
abstraction aspect in pattern recognition. When a solution weight 
vector w is obtained from the application of the algorithm, It can be 
used in the pattern recognizer as illustrated in Figure 3. A digital 
computer program for the algorithm (2.29) has been v^ritten in M\D 
language. The calculation of A^^ was obtained according to Kalman 
and Englar’s scheme^^^^. The program was originally written in FORTRAN 
and then translated into MAD language to be consistent with the 
language used for the proposed algorithm. The complete computer program 
is included in Appendix B. The propose<l algorithm has been applied to 
the two pattern classification problems as described in the next two 



sections . 
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Figure 3. Block Diagram for a Pattern Dichotomizer 
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A. A Character Recognition Problem 

In this study patterns consisting of four pairs of hand printed 
alphanumeric characters were considered. The data were obtained from the 
Learning Research and Development Center at the University of Pittsb?^rgh. 
One of the Center’s activities is to teach children, ages five to eight, 
the alphabet and numbers, via instructional devices and computers. 

Hence, the machine recognition of hand printed characters is a current 
research Interest in the Center. The selected pairs of characters are 
similar In form and the patterns collected are representative of 
children’s hand printing. The four pairs considered here are A and H, 

Z and 2, I and 1, and G and 6. Each character was written inside a 
square with 12 by 12 divisions. Five attributes or pattern components, 
^ 0 ** 1 ’*2*^3* ^4* obtained from each pair of characters for 

classification. The first attribute was the height of the character 
and was normalized to be 1.0. The other attributes were certain length 
and width, etc., each of which was a fraction of this height. The 
attributes given to describe the four pattern pairs are shown in Figure 4 
to Figure 7. These represent sets of crude but simple features of hand 
printed character pairs. The pattern components of character pairs for 
the sample or training sets are listed in Table 6. The original hand 
printed characters are reproduced in Appendix C. Note that since the 
normalized height is unity for all characters, it can be assigned as 
the component or threshold attribute of the x vector. Hence x is a 

5 by 1 vector with n ■ 5. 



er|c 
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Figure 4. Pattern Components of 
the A-H Pair. 






I 

^ I 




/I* 







Figure 5. Pattern Components 
of the T-1 Pair. 





1 




Figure 7. Pattern Components of 
the G-6 Pair. 
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Table 6. Pattern Components of the Character Pairs Used in the 
Training Sets. 



A H 



X 

0 


^1 


’^2 






X 

o 


X 

1 


*2 


’'3 


*4 


1.0 


.8 


.0 


.5 


.5 


1.0 


.6 


.6 


.0 


.5 


1.0 


.8 


.0 


.5 


.25 


1.0 


.6 


.5 


.08 


.5 


1.0 


.8 


.0 


.5 


.35 


1.0 


.6 


.4 


.16 


.5 


1.0 


.8 


.0 


.5 


.75 


1.0 


.6 


.3 


.25 


.5 












1.0 


.6 


.2 


.3 


.5 












1.0 


.6 


.6 


.0 


.3 












1.0 


.6 


.6 


.0 


.6 


z • 


2 


X 

o 


^1 


X 2 


^3 


^4 






X 2 


^3 


X4 


1.0 


.7 


.08 


.08 


1.0 


1.0 


.6 


.2 


.08 


.7 


1.0 


.7 


.17 


.08 


1.0 


1.0 


.6 


.1 


.08 


.7 


1.0 


.7 


.25 


.08 


1.0 


1.0 


.6 


.3 


.08 


.7 


1.0 


.7 


.42 


.08 


1.0 


1.0 


.6 


.08 


.08 


.7 


1.0 


1.0 


.08 


.08 


1.0 


1.0 


.6 


.1 


.08 


• 5 


1.0 


1.1 


.08 


.08 


1.0 


1.0 


.6 


.1 


.08 


.3 


1.0 


1133 


.08 


.08 


1.0 


1.0 


.6 


.1 


.08 


• 8 


I 


1 


X 

0 


’'i 


X 2 


^3 




X 

o 


*1 


*2 


^3 




1.0 


.3 


.3 


.3 


.0 


1.0 


.05 


.0 


.0 


.05 


1.0 


.4 


.25 


.4 


.1 


1.0 


.2 


.0 


.0 


.2 


1.0 


.4 


.3 


.25 


.15 


1.0 


.1 


.0 


.0 


.1 


1.0 


.7 


.45 


.5 


.35 


1.0 


.5 


.0 


.0 


.5 


1.0 


.5 


.3 


.25 


.25 


1.0 


.4 


.0 


.0 


.4 


G 


6 


^0 


*1 


^=2 




’'a 


X 

o 


Xi 


X 2 


X3 


^4 


1.0 


.8 


.5 


.8 


.0 


1.0 


.5 


.5 


.0 


.33 


1.0 


.8 


.5 


.2 


.0 


1.0 


.5 


.3 


.0 


.25 


1.0 


.8 


.5 


.5 


.0 


1.0 


.5 


.4 


.0 


.33 


1.0 


.8 


.5 


.62 


.0 












1.0 


.8 


.5 


.8 


.25 












1.0 


.8 


.5 


.8 


.12 












1.0 


.8 


.5 


.8 


.2 
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It was desired to recognize A against H, Z against 2, I 
against 1, G against 6 , or vice versa. For each pair group, designate 
patterns of one character as belonging to class C]^ and patterns of the 
other character class C^. For example, character A belongs to class 
in the first pair, character Z belongs to class in the second pair, 
character I belongs to class Ci in the third pair, and character G 
belongs to class Cl in the fourth pair. For each pair group, a dis- 
criminant function, g(x) = x*’w» was to be determined. As shown in 
Table 6, there were eleven sample patterns for the A-H pair, fourteen 
sample patterns for the Z-2 pair, and ten sample patterns each for 
the I-l and G-6 pairs. The size of matrix A varied from 10 by 5 to 
14 by 5. The proposed algorithm was applied to each pair group with 
^^(0) = [0. 1,0.1, ... ,0.1J and p(k) given by equation (2.47) to obtain 

the following solution weight vectors, w , w , w__ , and w : 

“AH ~^2 “II 



.0070 

.0100 

w = .0001 

AH 

-.0001 

.0001 
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.0037 

.0014 

.0010 

-.0003 

.0031 



( 



w 



-II 



.0851 

.0001 

.1566 

.1743 



-.0001 



I 

I 



“ .4333 



-G6 



.6661 

.0001 

.0001 



-.0001 

L. 



where ttie first subscript refers to class and the second subscript refers 
to class in each pair group. These solution weight vectors were all 
obtained after the zeroth iteration. 




The solution weight vectors were also tested by some new sample 
patterns. Only in the Z-2 pair group, there was one misclassif ication 
among a total of twelve new sample patterns. This misclassif ied one was 
a 2 which was written so ambiguously that even a human observer could 
hardly distinguish it from Z. 

B. A Biomedical Pattern Recognition Problem 

The proposed algorithm was also applied to a biomedical pattern 
recognition problem. The problem is to investigate whether or not a 
change exists in the diurnal cycle of an individual person upon a change 
in his environmental condition or physiological state and if such a change 
may be used to diagnose physical ailements under strictly controlled con- 
ditions by measuirng the amounts of electrolytes present in urine and 
blood samples every three hours. The problem and data were presented 
by Dr. Venucci of the School of Medicine, University of Pittsburgh. The. 
data consisted of thirteen sample patterns under two different conditions. 
Each pattern has eight components which represent the concentrations of 
electrolytes. Thus N = 13 and n * r+1 « 8+1 « 9; the size of the pattern 
matrix A is 13 by 9. The pattern matrix A is shown in Table 7. Let 
b^(0) • [0.1,0.1, . . . ,0.1] . For this problem the Ho-Kashyap algorithm 
with p ■ 1 required 927 iterations to determine the separability. However, 




Table 7. The Pattern Matrix A from a Biomedical Experiment 
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the proposed algorithm with p(k) given by equation (2.47) required ohly 
two iterations, where p(0) * 5.270684 and p(l) * 3.197152. The problem 
Is linearly separable and a solution weight vector w obtained by the pro- 
posed algorithm Is 



w * w(2) ■ 



13.6089 

2.5915 

1.6847 

2.2314 

0.3414 

3.0077 

1.8428 

1.6559 

0.0096 



It was observed In this case that the proposed algorithm reduced the 
number of iterations required by a factor of approximately 450 over that 
required for the Ho-Kashyap algorithm. 

Notice that in the examples above components of a weight vector 
for a given pattern may differ In magnitude by as much as 1700. Although 
the magnitude of the attributes differ by as much as 200 It Is possible 
that some of the attributes are not necessary to describe the pattern. 
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This result is determined by noticing the small effect of the products 
of these attributes and their corresponding weights have on the 
inequality. For economic reasons one would choose the least number 
of attributes to describe a pattern, but for flexability and reliability 
it is necessary to have sufficient attributes. This suggests the 
development of an experimental procedure to select an adequate set 



of attributes. 
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V. GENERALIZATION OF THE ACCELERATED ALGORITHM OF LINEAR 
INEQUALITIES TO MULTICLASS PATTERN CLASSIFICATION 

A. Preliminary Remarks 

The problem of multiclass patterns classification is that it 
must be determined to which of the R different classes, C^,C 2 » • • • >0^^, 
a given pattern vector, x» belongs. If the R-class patterns are 
linearly separable, there exist R weight vectors w^ to construct R 
discriminant functions gj(2i)» • • • »R) » such that 

g (x) - x^^ > x^^ - g.Cx) for all 1 j , x e C (5.1) 

® j ^ ^ 1 “ j 

The Improved algorithm for dlchotomlzatlon obtained in Chapter II 
will be generalized to the multiclass pattern classification. A 
similar criterion function will be specified and a convergent iterative 
algorithm will be devised, incorporating the gradient descent procedure, 
to make the proposed multiclass algorithm a direct analog of the pre- 
viously described dichotomous algorithm. 

The notion of equilateral simplex will be used^^^ Chaplin 

and Levadl^^^^ have formulated another set of inequalities, other than 
(5.1), which can be considered as the representation of linear separation 
of R-class patterns. This set of inequalities is 



o 

ERIC 

hiaifiiifftaiTi-TaaiJ 
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I I < I \^V. “ for all i j , X G 

for all j"l,2,...»R 



(5.2) 



where U is an n by(R-l) weight matrix and the vectors e^’s are the vertex 

vectors of a R-1 dimensional equilateral simplex with its centroid at the 

If each e is associated with one class » 2i i® classified according 
^ t 

to the nearest neighborhood of the mapping IJ 3C, a^ illustrated in Figure g. 
The (R-1) by 1 vectors .®j*® have the following properties; 



e II - 1 for all j*l,2,...,R 



e. " ® 

J i 



e - e, I I for all i, k j 

j ^ 



(5.3) 



and 



(e - ej) > 0 for all i j . 
"" j J 



(5.4) 



The components of e^ , - f®jl » * * *®ji’ * * (R-1) ^ determined as 

J J 



follows: 



^1 



r( — L-)( 



for j-i 



i_ forJ>i 



(R-i) 



‘R-1 'R-i+1 



0 



for j < i 



(5.5) 



• • • >Rj i*l J 2 > • • • > R"*!) • 
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Figure 8. Equilateral Simplex Vertices and Nearest Neighborhood 
Mapping of Pattern Vectors , R * 3. 
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Inequalities (5.2) are, In fact, equivalent to Inequalities (5. 
will be shown below. Rewriting inequalities (5.2) one obtains 



J J 



or 



t.. „t _.t„ _ _ tyt^ ^ ^ , 

- - “ j -j 



^ 2i H H X - X U e ^ 






for alx 1 j , X e C 



J 



Since, from (5.3), 



ej- 1 - e 



equation (5.6) , upon simplification, reduces to 

-x*^U(ej-e^) < - (e*^i-e*^j)U*^x» for all 1 j , x e Cj 

x*^U(^-^) > - [x*^U(e -e )J*^, for all 1 j x e C 

j J 

Since x*’U(e -e ) Is a scalar, the above Inequality Implies 
“1 



xS(e^ “ ^) >0 for all 1 j x e C . 

“1 J 



Let 



w j ** ^ Cj , j *1 »2,...,R 



er|c 



. This 



(5.6) 



(5.7) 



(5.8) 



(5.9) 
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Then (5.8) becomes 

x*^w. > x*^w for all i j , 2 L e C 
J i J 

which is (5.1). 

However, in order to generalize the dichotomous algorithm of 

Chapter II to a multiclass algorithm, additional information of linear 

(15) 

inequalities is necessary . Let the N x n pattern matrix A be 
defined in the following manner. 



A = 



1 

l> 

I-* 

— 1 




1 

rt 






• 

• t 

"1-1 


• 

• 

• 




• 

• 

• 


A. 








A 


r.j-3 


m 

m 

m 




• 

• 

• 






t 

1-R 






• 

• 

*t 

n X „ 

R- R 


- 





(5.10) 



where A is an n bynn submatrix having as its rows n transposed pattern 

.1 t ^ 

vectors of class C^, (i*l,2, . . . ,n^) , and N = n^ + n^ + ••• + n^. 

Designate the n by (R-1) weight matrix U as composed of (^-1) column 
vectors u , (q=l,2, . . . ,R-1) . 
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U = 






I • 

I u I 
. «li 
I I 



I 

. . I u 

, R-1 
I 



Also define an N by (R-1) matrix B as 



B = 







1 

rH 

ml 

1 




1^ 1 

• 

• 






ni‘l 


• 

• 

• 

B 

j 




• 

• 

• 




A 


t 


• 

• 

• 




• 

• 

• 






1- R 






• 

• 

• 










— ^ 



(5.11) 



( 5 . 12 ) 



whose row vectors b*"., (j=l,2, . . . ,R; £ =1,2, . . . ,ir ) , correspond to the 

£- j J 

class groupings in the A matrix and satisfy the following inequalities 



b^ (e. - e.) > 0 for all i ^ j 

(5.13) 

for all j=l,2, . .R * 



B is a n by (R— 1) submatrix of j=l,2,...,R. Let an N by (R-1) matrix 

“j ^ 



er|c 



a 
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Y be defined as 



Y = A U - B , 



(5.14) 



The representation of Y may be in the form of either an array of (R-1) 
column vectors, (q=l,2, . . . ,R-1) . 



i=k 1- U,!-- K-i] 

L ' I I I J 



(5.15) 



or an array of N row vectors ( j-1.2 » . • • *R» i=l»2 , . . . ,n^) , corresponding 



to the class groupings in the A matrix 







* V 


^1 




1-1 

• 

• 

• 






"A 


• 

• 

• 




• 

• 

• 




ax 


1^ 

• 

• 

• 


• 




• 


• 

• 




• 

• 






1^ 

• 






• 




- 



(5.16) 
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where Y is ann by (R-1) submatrix of Y, 

“j ^ 



Y - A U - 
-i -i- -i 



(5.17) 



or 



Ifc-j I- j 



^ TT 



3 , 2 , . . . ,R 
£«1,2, . . . ,n 



j 



(5.18) 



The set of linear Inequalities which will be discussed In this chapter 
Is, from (5.8), 



A 4 U(e -e.) > 0 for all 1 j 
-3- -1 

for all j*l, 2, . . . ,R 



(5.19) 



Associated with It Is another set of linear Inequalities 



- (^U 



(5.20) 



or 



for all 1 j 
for all j*l,2, . . . ,R 



Y (e -e ) - ( X* U - b‘ )(e -e ) > 0 
”j ~j ~i r 3 i 3 J ^ 



for all 1 j 

all j“l,2, . . . ,R 
all £ «1,2, . . . ,n 



j 



(5.21) 



o 
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Since, by (5.13), B (e - e ) Is constrained to have positive components 

“J -i ~i 

for all 1 J , Inequalities (5.20) or (5.21) Implies the Inequalities 
(5.19) and hence (5.1) or (5.2). When Inequalities (5.19) are satisfied 
for all 1 9^ j and for all j*l,2, . . . ,R, a solution weight matrix U Is 
reached which will give linear classification of R-class patterns; that Is, 
If 



H > 0 for all 1 ^ j 

•J 

then X Is classified as of class . Also, If R weight vectors w j , 

j»l,2,...,R, are computed from U according to (5.9), then R discriminant 

functions, g.(x) ■ x*"^ , (J*l,2, . . . ,R) , can be obtained for use In the 
J “ -3 

R-class pattern recognizer shown In Figure 9. 



B. Development of the Algorithm 



For the notatlonal simplicity In the derivation of the gradient 
function to be developed below » let the matrices A, U, ^ and Y In 
equations (5.10), (5.11), (5.12), and (5.15) be represented respectively 
as 



"“11 


®12 


®ln 




^N2 


^n 









(5.22) 




T 

1 



1 
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Figure 9. Blodk Diagram of a Multiclass Pattern Reorganizer. 



n 
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and 



U 



B - 



Y - 





^^12 


—I 

u 

1,R-1 


“nl 


“n2 


'^n,R-l 


■>’11 


”12 


'’l,R-l 




'*N2 


^N,R-1 


'yii 


^^12 


yi,R-i " 


^Nl 


^N2 


^N,R-1 



(5.23) 



(5.24) 



(5.25) 



Substituting these into equation (5.14), one obtains 



n 



y.4 ■ I ^ 



ij 



k-1 



ik “kj "ij * 



(5.26) 



Let C(Y) be an N by(R“^) matrix defined by 

C(Y) - 



^12 



®N1 ^N2 ®N,R-1 



cosh I y^^ cosh y^2 



cosh j cosh Y y^2 



cosh Y yi,R.i 



2 ^N,R-1 



(5.27) 



k ( 
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The criterion function J(Y) to be minimized Is chosen as the trace of 
4 C*^(Y)C(Y), 



N R-1 2 



N R-1 



J(Y) - Tr(4C*^C) - 4 f ^ - 4 [ j (cosh | 



1-1 j-1 



ij 



1«1 j=l 



N R-1 

Z Z J-d) 

1-1 j -1 



(5.28) 



where 



Jy(Y) - 4(co8h I 



(5.29) 



Following the same approach of the dichotomous case, determine the 
gradients of J(Y) w'lth respect to both U and 



3U 



9y 



4 (C 08 h 2 8 lnh 7 y^) 

9y 



il 



- 2 slnh y 



ii 



Ij 9U 



(5.30) 



where the derivative of a scalar with respect to a matrix Is a matrix. 
From (5.26) and (5.30), 



U 



- 2 slnh y 



ij 



1 

o 


0 


0 


^11 


0 


o 

1 


• o 


0 


0 


^12 


0 


0 


0 


0 


0 


^in 


0 


0 



(5.31) 



jth column 
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Then the gradient of the criterion function, J(Y), with respect to 
the matrix U Is 

“ N 



N 



. 2 



3U 



N 






N 




^ ^11 
1-1 


slnh 


^11 


... I 
1-1 


"ll 


N 






N 




7 a 


slnh 


y. 1 


... I 




1-1 12 




^11 


1-1 


12 



N 

I 

1-1 



a^ slnh y 
In ^11 



N 

I 

1-1 



a. slnh y 
In ■'ij 



N 

I 

1-1 



a. slnh y. „ - 
In -^IjR-l 





■^1 


*21 


••• “hi 


- 2 


“12 


*22 


• • • d 

N2 




a. 


a^ 


• • • d 




In 


2n 


Nn 



slnh 


’'ll 


slnh y ^2 ... 


slnh 


^1,R-1 


slnh 


’'21 


slnh y ^2 ••• 


slnh 


^2,R-1 


slnh 


’'ni 


slnh ... 


slnh 


^N,R-1 



2 A S(Y) 



(5.32) 



where ^(Y) Is an N by (R-1) matrix with the following representation. 
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and 



i<X> 



"sinh y 


sinh y • • • 


slnh y 


11 


12 


^ A 


slnh 


slnh yj,2 • • • 






1 

>-✓ 

1 




S (T) 

1-1 - 

• 

• 










- 


• 

• 

”A® 


• 

• 

• 




• 

• 

• 


s^(X) 




i5r® 

• 

• 

h 



rhere 



^S.(Y) is a row vector of the following form 



f 



ySinh 



^(nj_i+l) .R- 



( 5 . 33 ) 



], ( 5 . 34 ) 



lij(X) “ y(„j.i+l)f.l 
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From (5.29) 



3 J . . (Y) 



- 4 (cosh T y. . sinh J V. ) 

3B Z IJ ^ Ij gg 



1 »yii 



3y 

■ 2 sinh y.. — lil 

3B 



(5.35) 



From (5.26) 



9y 






iL 



3B 



0 • • • 
0 • • * 



J 

I 

0 ... 0 



0 • • • •*! • • • 0 



0 • • • 0 



0 * • • 



(5.36) 



Substituting (5.36) Into (5.35) gives 



3B 



-2 



0 ... 0 ... 0 



0 . . • 0 



0 ••• sinh 






... 0 



0 . . . 0 ... 0 

0 . . . 0 ... 0 



! 

J 



(5.37) 



Hence, the gradient of the criterion function J(Y), with respect to 
the matrix B is 



aJ(Y) 

“ 3 ^ 



- -2 



slnh y^^ 


slnh y ^2 


• • • slnh 


slnh yjjj 


slnh y ^2 


... slnyy^,^ 



- -2 S(Y) 



3J(I) 



(5.38) 



Since U is not constrained in any manner ■ 0 implies that 

30 ” 

S(Y) - 0, which, in turn, implies that slnh y.. ■ 0 and hence y.. ■ 0 
— 3 j(y;) 

■ 0 and a 



for all 1-1,. ..,N and j-1,2, . . . ,R-1. Therefore, for 
fixed B, 



Y-AU-B-0 

which gives a least square fit of 

U - A^'^B . 



(5.39) 



On the other hand, for a fixed U and the constraint Bj(e -e^) > 0 for 
all 1 9^ J as given in (5.13), ^ may be incremented according to the 
following gradient descent procedure to reduce J(Y) at each step. 



B(k+1) - B(k) + 6B(k) 



(5.40) 



83 



where the q-th element, Ck)l, of in (k) i® gi’^en by 



«[,bj^(k)] = 



■P(k) 

3 ^ 



0 



if Y.(k)(^-e ) > 0 for any 



if Y (k)(e -e ) <^ 0 for any 



j 



"j 



However, ^ does not imply (Y(k) ) (e .-e ) > 0. In order to 

*“j ~3 * 3 J 

make 6[ b ^(k)l(e -e ) > 0 so that (5.13) can be satisfied at each step, a 

modified gradient descent procedure is to be used. Let a (R-1) by (R-1) 

non-singular matrix E. be defined as 

J 



-j ‘ ' -1 ~i ’ ^-1’ 



( 5 . 41 ) 



Also define 



Z = Y E for all j=l,2,...,R. 

“j "j ”j 



( 5 . 42 ) 



The Increment <5[.b (k) J is then given in terms of 

* jq 






2 p(k) ^S^q(Z(k)) = p(k)[^S^^(Z(k)) + ^Ajq(k)l 



if Z, (k) = Y.(k)(e,-e ) > 0 

it jq Z-i -j -q 



Lo 



if Z (k) = Y (k)(e -e ) < 0 

Jq ^-J “j - 



( 5 . 43 ) 



where 



A (k) = S (Z(k)) Sgn ( Z (k)) 

A jq A jq ~ ^“jq 



and, following (5.33), 



jSjq(Z(k)) . Sinh ^Zjq(k) 



Putting Into vector representation. 



■Sl.bJWE 1 - p(k) [ S (Z(k)) + ,A,(k)] 
t-J -j Jl-j - A-j 



or 



= p(k) [^S^(Z(k)) + 



-1 



where 



p(k) H,(Y(k)) 
fJ ~ 



H (Y(k)) - I S (Z(k)) + ,A (k)]E 
At — t-j — AT j 



-1 



H^(Y(k)) • [S^(Z(k)) + 



-1 





■Hj^(Y(k) ■ 

• 




■^Hj^(Y(k)) * 

• 


H(Y(k)) - 


• 

H,(Y(k)) 

T ~ 

• 

• 


- 


• 

• 

• 




.Hj((X(k)) 




•n^SR<I<k))- 



(5.44) 



(5.45) 



(5.46) 



(5.47) 



[hj^(Y(k)) ... h (Y(k)) ... hR_i(Y(k))]. (5.48) 
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It follows from (5.46) and (5.44) that 



6[ b (k)](e -e^) ^ 0 for all and for all j. 

£> j j 

Then 



6[B(k)} = p(k) H(Y(k)) 

Substituting the above equation into (5.40), one has 

B(k+1) = B(k) + p(k) H(Y(k)) (5.49) 



Using the above equation in (5.39), one has 

U(k+1) = A(k+1) = (B(k) + p(k) H[Y(k)l} 

» U(k) + p(k) A^^H[Y(k)] (5.50) 

Therefore, an iterative algorithm to solve for IJ can be proposed in the 



following : 

■ U(0) = A^^B(O) 

Y(k) = A U(k) - B(k), 1^0^) = 

B(k+1) = B(k) + p(k) H[Y(k)], (Y(k))=[^ (k))+A^ (k) 

.U(k+1) = U(k) + p(k) A^^ H[Y(k)] (5.51) 



where p(k) may be chosen as equal to 



P(k) 




{^ej(k) + AWk))(Ej'^)'^R( ^ (k) (I(k)) ) 

t T~ 

2 T h (I - A A ) h 

qSl -<i - - (5.52) 



provided that 



R 

I 



n 



I fjEjW + jl^(I(k))(E^'^)R(j^^(k))E^'^jHj(Y(k))) > 0 



(5.53) 



where c (k) and R( Z (k)) are defined in (5.62) and (5.60) respectively as 

^ j “ £“j 

will be shown later. The initial ^ matrix, ]B(0) , may be chosen from 
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) 6 > 0 (5.54) 



where are vertex vectors of (R-1) -dimensional simplex and 3 is an 
arbitrary positive constant. A recursive relation in Y(k) is also 
obtained as follows: 

Y(k+1) = Y(k) + p(k)(A - I)H[Y(k) ] (5.55) 

Compare (5.51) and (2.29), it is evident that the above algorithm for 
multiclass pattern classification is a generalization of the dichotomy 
algorithm developed in Chapter II. 

C . Theorem 2 

In order to prove the convergence of the algorithm (5.51), the 
following discussion is necessary. 





I 
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Lemma 2 . Consider the set of inequalities (5.19) and the algorithm 
(5.51) to solve it. Then 

1) Ij(W(ej-e^) to for all Ijfj 

for all j“lj2, ... jR 
for any k 

2) If (5.19) is consistent, then 

Y.(k)(e.-e.) iO for all li^j 

_j — j — 1 

for all j**l,2, . . . ,R 
for any k 

Proof . 

1) Let 

Y.(k)(e,-e.) > 0 for all li^j 
“j “j -T. 

for all j*l,2, . . . ,R 
for some k 

Since 



B^(k)(e^-e^) > ° all 1»‘J 

Then 

(e -e.)V*(k)B^(k)(e^-^) > 0 for all 
^ — i -j —j “j i 

Y^(k) B (k) > 0 for all j*l,2,...,R 

j ^ 




it follows that 



Y(k)B(k) > 0 

But 

Y(k) * (A - I) BOt) , 

Y*^(k)B(k) - B^(k) (A A^^ - 1) B(k) 0 



since 

(A / - I) i 0 • 

This is a contradition. Hence 

I. (k) (e. - ^) i ° 

for all j*l,2,..« 
for any k. 

2) Assume that (5.19) is consistent but 

Y (k) (^-e ) 

"i 

for all j*l»2»... 
for some k 

* 

Consistence of (5.19) Implies the existence of a U 
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A u* (e -e_.) - 

-J- -J -1 -J T 1 

for sXl j“X»2j«»»»R 



Therefore 



(ej-V' 

ili J 



(k) B J < 0 



-e 



) < 0 



for all Ijtj 

for all j-l,2,...,R 



and 



t ^ * 

Y'(k) B 



K ^ * 

I 1 (k) B < 0. 

j-1 J ^ 



But for any Y(k) , 

A*^Y(k) - A*^(A U(k) - B(k)) - A*^ ( A A^^ - 1) B(k) 
- (A*^A pJ^ - A*^) B(k) - 0 B(k) - 0 . 



thus 

U**^ A*^ I<k) - 0. 



or 



Y*^(k) A U* - Y*^(k) B* - [ I*^/k) B - 0 

- j.l J J 

which is a contradltlon. Hence, If (5.19) is consistent. 



Y^(k) (ej-O i° 



for all l^j 

for all j*l,2, . . . ,R 

for any k. 
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Theorem 2 . Consider the set of linear inequalities (5.19) and the 
algorithm (5.51) to solve them, and let 



1) If the set of linear inequalities is consistent, then 

a) AV[Y(k)] » V[Y(k+l)] - V[Y(k)] < 0 and 

lim V[Y(k)] “ 0 implying convergence to a solution 
in an infinite number of iterations; and 

b) Actually, a solution is obtained in a finite number of 
steps. 

2) If the set of linear inequalities is inconsistent, then there 
exists a positive integer k such that 




n. 



* 



AV[Y(k)]< 0 
AV[Y(k)] » 0 



for k < k 



•k 



for k > k 



Y. (k)(e -£^)^ for k < k 
^ ^ J “1 for all i?*j 



for all j»l,2, . . . ,R 



9 • • • f 




for all k >_ k 
for all i?tj 
for all j*l,2 



R 



and 



U(k) “ U(k ) for k ^ k 



k 



k 



B(k) a B(k ) for k > k 
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In other words, the occurrence of a matrix Y(k) with all non-positive 

elements of Y(k)-(e,-e^) for all and all j at any step terminates ^ 

— “1 

the algorithm and indicates the nonlinear separability of the R-class patterns. 
Proof . 

The proof of this theorem is similar to the convergence proof 

of the generalization of Ho-Kashyap algorithm to multiclass pattern 
, .4 < 23 ) 

classification 
Part 1 ; 

With reference to the recursive relation in Y(k) given by ] 

(5.55), V[Y(k)] can be considered as a Liapunov function, I 

V[Y(k)] - Tr[Y*^(k)Y(k)] > 0 for all Y(k)j‘0 . (5.56) 

Now, 

4V[Y(k)l - V[Y(k+l)] - VlY(k)] 

- Tr[Y^*^(k+l)Y(k+l) - Y‘(k)Y(k)]. (5.57) 

Since 

Y*(k+l)Y(k+l) - Y'^(k) Y(k) 

- [Y*=(k) + p(k)H‘(Y(k)) (A /-I) 1 [Y(k)+P(k) (A A*'-I)H(Y(k)) l-Y‘(k)Y(l 

- p(k) u'^[Y(k)](A A*-OY(k) + p(k)Y*(k) (A /-I)H[Y(k) ] 

+p2(k) H‘lY(k)l (I - A A'^H(Y(k) 1 , 





m 



T 

I 



SR” 
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and 

A /Y(k) - A a'' [A U(k) - B(k) J - A A* [A A^'B(k) - B(k)i 
■ A A*B(k) - A A*B(k) - 0. 



then 

V[Y(k)J - Tr{-2p(k)u'^(Y(k)]Y(k)+p^(k)H*^[Y(k)J(I-A A*)H[Y(k)]) 



-2p(k) Tr {H^[Y(k)]Y(k))+ p^(k)Tr{H*[Y(k) J U-A A*)H[Y(k) ] ) 



R-1 



■2p(k)Tr{H(Y(k)Y‘(k)} + p2(k) [ h* [Y(k) ] (I-A A")h [Y(k) J 

“ “ “ q-1 1 



R 1 t ^ « 

- 2 p(k) I I .ILWW.L (k)+P (k) I h ^Y(k))(I-M )h (Y 

j-1 t-1 '‘"1 J q-1 ^ ^ 



(5.58) 



From (5.45) and (5.33), 



■ t 









• > 



Slnh 

.R-1 



,R-1^ 






t^Jl 



Slnh 

l^J.R-1 • 




A*‘V 



(5.59) 
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where 



(5.60) 



£■! ^ 2 9 • • • ,n ) 

•J 



. Sinh »Z, 

r ( Z ) £ Ui > 1 



(cj*l j • • • jR— X) • 



t-Jq 



Substituting (5.59) Into (5.47), 



^H^(Y(k)) - (^Z^(k)R(|^Z^(k) + (k)]E^'^ 

J 



then 



^ J ^ 

-2p Z I iH, zL 

■ -2P Z I (zljH(,Zj) . 

■-2p I I [ll/(lZJ)+lAJ^IJ'^iJ‘)■V^z^>'l^j^^l^j)>‘ 

J ^ 

■-P I I [,2jiL(zij) 

-P I I ItAjA(zZj) +,A j)e/'(e/)'V2(^^)(,z^r(,Zj^ 

- -P I Z E~^(zlj)Ej\H^'=(Y(k)) 



•P Z Z (iZjE(zZj) + z^](Ej‘e^)"V\^^R(^Zj) 

J ^ 



(5 



Since the off diagonal elements In are negative and Z ) 

Is a diagonal matrix with all positive diagonal elements, the off diagonal 
elements of ^ R also negative. From (5. 44), (5.47), and 

(5.49), the elements of (^^jR( Z^) + J are either positive or zero, and 



.61) 
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the corresponding elements of I Z R( Z ) + 1 are either zero or negative 

X, J X, J X, j 



Hence, 






for all j and all i. (5.62) 



Substituting (5.62) Into (5.61) which In turn, Is substituted into (5.58), 
one obtains 



AV[Y(k)l 



-P(k) I I A(I(W)(Ej'')'V^(^Z^)Ej''|^li^(Y(k)) 

J ^ 

-pO^) I I I - A 

j 4 ^ ^ q ^ 

-Ptk) I I (k) + .H,(I(k))(E Z,)E V ‘(Y(k)) 

+ p^(k) 7 h ‘"(Y(k))(I - A a'‘')1i (Y(k)) 

q~q~ “ ---q- 

^ ■> t // 

-p(k) I I c.(k) - p“(k) [ h ■'(Y(k))A A''h (Y(k)) 

j-1 1-1 q-1 '' ^ 

A (Y (k) ) { (Ej ‘ R'^ (^Zj ) -P(k) 1 1 Ej '') (Y(k) ) 

(5.63) 



n 



R j 
- P(k) I I 
j-i fi,-i 



V(Y(k)) Is negative definite if the right hand side of the above equation 
Is negative definite in [ Z_.R(„^) + oA.j. The first two terms on the right 
hand side are negative seml-def Inite. If a value of p(k) can be found such 



o 

ERIC 



that 
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n 

R j . . 

I I olL.Wk)) Z.) - P(k)ill } (Y(k)) > 0 

j=l £=1 ^ 3 J ^3 3 J 



then AV(Y(k)) is negative definite in [p^.R(p^ ) + oA ^ • Note that when 
— J6 J X, ^ J6 j 



“ cosh Y (k) ’ 



max 



Y_(k) = Max I Y. (k) , 

j.^.q J'' 



[R ^( ^.) - p(k)!^] is positive definite and has real eigenvalues as can be 

shown by following (2.43) and (2.44); but it is not certain that 

(E.^)~^ [R ^(pA-t^ “ p(k)^] can be positive definite for all j and all 

^3 ^3 3 

£,. Let p(k) be so chosen as to maximize -AV[Y(k)] at each step, one 
follows the procedure used in Section II-C to obtain a choice of p(k) as 
given in (5.52), provided (5.53) is satisfied to make sure that p(k) > 0. 
For this value of p(k) , 



n 



^ j t -1 f 

[ I I + pH.(Y(k))(E/') ^ R(pZ (k))E ^ H (Y(k))}] 

-5=1 £=1 ^ J ^ ^ ^ 3 3 



AV(Y(k)) = - 



R-1 . 

4 J h ^(I - A A^^) h 
q=l ^ 



10 for (^Z^R(,Z^) + 1<0 . 



O 

ERIC 

MjMtMitrifaii 
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Hence, V[Y(k)} is negative definite in [ R( ^ ] . Note that 

^ j ^ J ^ j 

Z R( Z.) + -A, = 0 for all j and all I only if Z <0, that is only 
H-j -\-j IL-j - Z~j - 

if Y(k) = ^ or ^Y,(k)(e^ i^ — ^ Since it 

J j ^ 

is assumed that the set of the inequalities (5.19) is consistent, from 

the lemma X • i. ® j» therefore 

1 1 



AV[Y(k)J < 0 for all Y(k) ^ 0 

= 0 if Y(k) = 0 (5.64) 



and the solution X “ 0. of equation (5.55) can be reached asymptotically, 
that is 



lim I |X(k) 11^ = 0 

k-x» 

** ** ** 

which corresponds to a solution U with A U = B such that A U (e -e^) 

_ _ 

= B (e,-e ) > 0 for all i^^j and for all j. This completes the proof of 
“j ~i “ 

Part 1(a) . 

Note that if the X(0) given in (5.54) is 



(0)(e,-^) = e^.(e -e.) > 0 for all if^j 
J -J -1 - j -j -1 for all j 



(5.55) 






► 1 




Then the algorithm (5.51) gives 



I 
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]b/(k+l) = (k) + p(k)[^_^(Z(k)) + ^j(k)]E. 



-1 






r- 



j 



r-j 



£— Xy2y«««y n j ) 



(5.66) 



which implies 



^b‘j(k+l)(ej-e^) > ^b'= (0)(e^-e^). 



(5.67) 



From (5.65), (5.66) and (5.67), by induction 



b*^.(k+l)(e -e ) > (l+e)e*^ (e -e ) , 

i” J “j ”1 - i -j -i 



E > 0. 



for all if*j 

for all j 

for all k (5.68) 



which satisfies the condition given in (5.13). Since 



V[Y(k)j 



I IKW 1 1 




"j 

I lli}L(k)|l^ < 1 

i-1 ^ J 



at a certain finite k, it implies that 

I Ll (W 1 1^ < 1 

^ j 



and 










> - e 






if k is sufficiently large. 

for all i 7 <j 
for all j 
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(5.69) 



Let V be an n by 1 vector whose components all equal unity, 

m 

j 



- [1,1,1,. ..,1J 



(5.70) 



From (5.68) and (5.69), 

Bj(k)(ej-e^) > (l+e)e^j (ej-e^)v for all i 7 <j (5.71) 

Y^(k)(ej-e^) ~ e^j(£j-e^)v for all i?<j (5.72) 

Since 



AjU(k) - (k) + Yj (k) 

it follows that 

AjU(k) (^j-e^) *^j(k)(^-e^) + Y^ (k) (ej-e^) 

> (l+e)e*’.(e -e )v - e*’.(e.-e )v 

“ J “j “ “ J “J “i “ 

> e e^ (e - ^)v 

- J -J -1 - 

> 0 for all i 7 <j 

for all j-l,2,...,R (5.73) 



o 

ERIC 

hiaifiiifftaiTi-Taaa 



which Indicates a solution U » U(k) Is obtained In a finite number of 
steps. This completes the proof of Part 1(b). 

Part 2 ; 

If the set of Inequalities (5.19) Is Inconsistent, Y(k) cannot 

be ^ and hence V[Y(k)J cannot become zero for any k > 0. There must 

* 

exist a value of k, k»k , such that 

AV[Y(k)] < 0 for 0 <_ k < k* 

« 0 for k - k* 

^ it 

But as shown In Part 1, AV[Y(k )J ■ 0 only If either Y(k ) = ^ or 

^ A 

^Yj(k )(£j”^^) 0 for all l?^j and for all j. Since Y(k ) 7 ^ £, this 

Implies that 



.Y. (k*)(£.-e.) ^ 0 for all lf*j 
^ for all j 

hence, from (5.48), (5.44), and (5.51), (5.55) and (5.57), one has 



H[Y(k)] - 0 
B(k) - B(k*) 
U(k) - U(k*) 
Y(k) - Y(k*) 



* 

for all k ^ k 

* 

for all k ^ k 

* 

for all k > k 



for all k > 



100 



AV[Y(k)] =« 0 for all k ^ k* 

This completes the proof of Part 2. 

Therefore, the algorithm (5.51), together with p(k) given by 

equation (5.52) under the condition (5.53) and with MO) given by 

equation (5.5A), Is a convergent algorithm for the solution U of the 

set of linear inequalities (5.19). The nonlinear separability of the 

multiclass patterns can also be detected by observing at a certain 
1 * 

step k 

Y.(k*) (e^ - ^) 1 0 for all l><j 
^ “ for all j-l,2,..,R. 



VI. SUMMARY AND CONCLUSION 



In this dissertation, a new iterative algorithm has been 

developed to solve for a solution w, if one exists, to a set of 

linear inequalities^ AH ^ ^ which arises in pattern dlchotomlzatlon 

and switching problems. It is an Improvement of the Ho-Kashyap 

algorithm based upon the attempt to minimize a different criterion 

N 12 

function J<2) " ^ Z (cos y^) where Z ■ A w - b and b is a 

1-1 ^ 

vector with all positive components. This criterion function has 
a larger gradient than the one used by Ho and Kashyap. The algorithm 
is expressed in equation (2.29) with the incremental coefficient 
p(k) given by either equation (2.47) or equation (2.26). The 
algorithm also simultaneously tests for the nonexistence of a 
solution of the linear inequalities whenever y. 5. £• 

This algorithm has a higher rate of convergence than previous 
methods for a certain range of the choice of MO) . A comparison 
has been made between this Improved algorithm with p(k) given by 
equation (2.47) and the Ho-Kashyap algorithm with p-H, the conver- 
gence rate may be greatly increased for .001 b^(0) ^0.5 (1-1,2, .. ,N) , 

as verified by the computer results of switching theory and pattern 
classification problems in Chapters III and IV. For problems where 
a large number of iterations, for example, greater than twenty, were 
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required for the Ho-Kashyap algorithm, the proposed algorithm reduced this 
number of Iterations by a factor of 20 to 430. For problems where a small 
number of Iterations were required by the Ho-Kashyap algorithm, for example, 
less than twenty, the proposed algorithm reduced the number of Iterations 
by as much as 30 percent. 

The generalization of the proposed algorithm for a solution matrix 
IJ of a set of linear Inequalities ^U(£ ”££) ^ (for all l?*j and 

•I J 

j**l»2, . . . ,R) , which Is applicable to multclass pattern classification has 
been presented and a convergence proof has been given. This generalized 
algorithm Is expressed In equation (5.51) with p(k) given by equation (5.52) 
The convergence proof utilizes the concept of mapping the pattern classes 
Into vertices of an equilateral simplex whose vertex vectors are ^ , 
(1-1,2,.. .,R). 



The following six problems are suggested for further Inves- 
tigations: (1) to study In detail the relationship between the rate 

of convergence of the algorithm and the choice of p(k) and b(0) ; (2) to 
Incorporate the proposed algorithm Into the group-pattern adaptive 
procedure for pattern classification; (3) to apply the proposed 
algorithm for the development of an algorithm for piecewise linear 
separation In cases where the sample patterns are not linearly separable; 
and (4) to develop explicit algorithms to solve for nonlinear dis- 
criminant functions for some nonlinearity separable pattern recognition 
problems; (5) to extend the algorithm so that It can assure all w^ > 0, 
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for threshold logic circuits realizable by transistors; 
and (6) to develop a procedure to select an adequate set of pattern 
attributes providing for reliability and flexability in a teaching 



machine. 



APPENDIX A 



PROGRAM LISTING FOR THE SPECIAL 
ALGORITHM OF EQUATION (3.5) 
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A MAD Program listing Is shown on -^he fallowing pages 
for the application of the accelerated algorithm to switching functions. 
The program Is devised so that the Iterations for various Initial 
values of the b vector, b(0) , can be performed successively. This Is 
done ^ Inputlng NVAL equal to the number of Initial b vectors and 
VALU (1) . . .VALU(NVAL) equal to the Initial values of the b vector. 

The matrix A Is read In by FORMAB. FORMAB Inputs 
ID ■ Identification number 
N ■ number of columns of A 
M ■ number of rows of A 
NA ■ number of elements of class 1 
MB ■ number of elements of class 2 

and the elements of class 1 and class 2 with the mlnterm expressed In 
decimal form, elements of class 1 are entered first. The value of 
p(k) In equation (2.47) Is used for the program listing. 
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APPENDIX B 



PROGRAM LISTING FOR THE GENERALIZED 



INVERSE OF A MATRIX 
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A program listing In MAD language Is shown on the following 
pages for the calculation of the generalized Inverse of a matrix. The 
program is written for M ■* number of rows greater than N ■ number of 
columns. The matrix A Is read In by rows, a^^^ first. 
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